all groups > sql server mseq > august 2006 >
You're in the

sql server mseq

group:

SELECT DISTINCT records based only on two columns



SELECT DISTINCT records based only on two columns archuleta37
8/17/2006 6:38:02 AM
sql server mseq: I'm working on a mailing list and want to select records where the both the
SSN number and Address fields are unique, since I may want to send a piece of
mail to more than one address for a person, but not multiple pieces to the
same address for that person. I've created a test table and some test data to
try to get my syntax right (below) but so far have had no luck.

I'm not certain if I can do this using DISTINCT or the UNIQUE keyword, nor
how to get my syntax right. Could someone help me with this?

Thanks in advance :-)
Marc


************* CREATE TABLE *****************************
USE [Northwind]
GO
/****** Object: Table [dbo].[MY_CONTACTS_TBL] Script Date: 08/17/2006
09:07:19 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[MY_CONTACTS_TBL](
[UID] [smallint] IDENTITY(1,1) NOT NULL,
[SSN] [bigint] NULL,
[FNAME] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LNAME] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ADDRESS1] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ADDRESS2] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CITY] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[STATE] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[ZIP] [int] NULL,
CONSTRAINT [PK_MY_CONTACTS_TBL] PRIMARY KEY CLUSTERED
(
[UID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF

******************* TABLE DATA *****************************
UID,SSN,FNAME,LNAME,ADDRESS1,ADDRESS2,CITY,STATE,ZIP
1,123456789,Marc,Archuleta,123 Sesame St,apartment 1,Columbus,OH,43215
2,234567890,John,Doe,123 Dough St,,Columbus,OH,43215
3,345678901,John2,Doe,456 Dough St,,Columbus,OH,43215
4,456789012,John,DoeTwo,789 Dough St,,Columbus,OH,43215
5,234567890,John,Doe,123 Dough St,,Columbus,OH,43215
6,234567890,John,Doe,123 Pumpernickel Ave,,New York,NY,10016
Re: SELECT DISTINCT records based only on two columns Arnie Rowland
8/17/2006 7:49:25 AM
First, a suggestion.=20

Make the ZipCode field a varchar() -either 9 or 10, depending upon what =
you do with the dash. If you use an int, leading zeros (upper Northeast =
States) is lost, and 9 digit zipcodes (e.g., 98765-4321) will do the =
math and store the result.

Similar leading zero problem with SSN

The use of DISTINCT serves two purposes. First, it makes sure that there =
are no duplicates by testing all of the selected columns as a unit =
-ignoring capitalization differences (with the standard =
SQL_Latin1_General_CP1_CI_AS collation). Second, it provides a 'sorted' =
output since it has to accomplish an ORDER BY just to remove duplicates. =
However, if there is any character, including space, period, etc., =
difference, the comparison fails.

CREATE TABLE dbo.#MY_CONTACTS_TBL
( UID smallint IDENTITY(1,1) NOT NULL
, SSN char(9) NULL
, FName varchar(50)
, LName varchar(50)
, Address1 varchar(50)
, Address2 varchar(50)
, City varchar(50)
, State varchar(50)
, Zipcode varchar(10) NULL
)
GO

INSERT INTO #MY_CONTACTS_TBL VALUES ( =
'123456789','Marc','Archuleta','123 Sesame St','apartment =
1','Columbus','OH','43215' )
INSERT INTO #MY_CONTACTS_TBL VALUES ( '234567890','John','Doe','123 =
Dough St',NULL,'Columbus','OH','43215' )
INSERT INTO #MY_CONTACTS_TBL VALUES ( '345678901','John2','Doe','456 =
Dough St',NULL,'Columbus','OH','43215' )
INSERT INTO #MY_CONTACTS_TBL VALUES ( '456789012','John','DoeTwo','789 =
Dough St',NULL,'Columbus','OH','43215' )
INSERT INTO #MY_CONTACTS_TBL VALUES ( '234567890','john','doe','123 =
dough St',NULL,'columbus','oh','43215' )
INSERT INTO #MY_CONTACTS_TBL VALUES ( '234567890','John','Doe','123 =
Pumpernickel Ave',NULL,'New York','NY','10016' )
INSERT INTO #MY_CONTACTS_TBL VALUES ( '234567890','John','Doe','123 =
Dough St.',NULL,'Columbus','OH','43215' )

SELECT DISTINCT
FName + ' ' + LName
, Address1
, isnull( Address2, '' )
, City + ', ' + State + ' ' + ZipCode
FROM #MY_CONTACTS_TBL

--=20
Arnie Rowland, Ph.D.
Westwood Consulting, Inc

Most good judgment comes from experience.=20
Most experience comes from bad judgment.=20
- Anonymous


[quoted text, click to view]
Re: SELECT DISTINCT records based only on two columns archuleta37
8/17/2006 10:35:02 AM
Arnie,

You make some good points about the table design. Also, thanks for shedding
a bit more light on how the DISTINCT keyword works. Based on the query you
wrote, I see my mistake was to include the UID field in my query. I don't
think I will actually need that field so that should do the the trick. But
out of curiosity, I wonder how the query could be written if I did need the
UID field. Would I use multiple select statements and the IN keyword or
somthing like that?





[quoted text, click to view]
Re: SELECT DISTINCT records based only on two columns Arnie Rowland
8/17/2006 11:46:00 AM
That would be more troublesome, since there would be two or more UIDs =
for a duplicated set of fields. Which one to pick?

You could use a GROUP BY, and decide that you want the max( UID ) or =
min( UID ). Something like this:

SELECT
max( UID )
, FullName =3D ( FName + ' ' + LName )
, Address1
, Address2 =3D ( isnull( Address2, '' ))
, CityState =3D ( City + ', ' + State + ' ' + ZipCode )
FROM #MY_CONTACTS_TBL
GROUP BY
FNAME
, LNAME
, Address1
, Address2
, City
, State =20
, ZipCode

--=20
Arnie Rowland, Ph.D.
Westwood Consulting, Inc

Most good judgment comes from experience.=20
Most experience comes from bad judgment.=20
- Anonymous


[quoted text, click to view]
what you do with the dash. If you use an int, leading zeros (upper =
Northeast States) is lost, and 9 digit zipcodes (e.g., 98765-4321) will =
do the math and store the result.
[quoted text, click to view]
there are no duplicates by testing all of the selected columns as a unit =
-ignoring capitalization differences (with the standard =
SQL_Latin1_General_CP1_CI_AS collation). Second, it provides a 'sorted' =
output since it has to accomplish an ORDER BY just to remove duplicates. =
However, if there is any character, including space, period, etc., =
difference, the comparison fails.
[quoted text, click to view]
'123456789','Marc','Archuleta','123 Sesame St','apartment =
1','Columbus','OH','43215' )
[quoted text, click to view]
'456789012','John','DoeTwo','789 Dough St',NULL,'Columbus','OH','43215' =
)
[quoted text, click to view]
AddThis Social Bookmark Button