all groups > sql server programming > october 2007 >
You're in the

sql server programming

group:

fastest way to dedupe a list


fastest way to dedupe a list Michael Evanchik
10/20/2007 9:16:01 PM
sql server programming: I have a table with 1 field with 40 millian records in it. What is the best
way to dedupe a list?

a) create a table with aunique constraint on it and insert into that table?

b) create a table with out a unique constraint on it and do a "insert into
table1 select DISTINCT field from table2"

c) does having a indentity field improve performance or not?

d) another way?

please keep in my mind, its a 2 gig list, and i have already seen sql
server 2005 blow up my hard drive and use over 30 gigs just doing this simple
iperation.....


Re: fastest way to dedupe a list Erland Sommarskog
10/21/2007 12:00:00 AM
Michael Evanchik (MichaelEvanchik@discussions.microsoft.com) writes:
[quoted text, click to view]

This question has been answered in comp.databases.ms-sqlserver. Please do
not add more duplicates to the world by posting the same question
independently to multiple groups.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Re: fastest way to dedupe a list Michael Evanchik
10/22/2007 10:46:00 AM
Erland thans so much for your elegant answer. I did try to search first. I
looed for dedup, disintct "faster" keywords etc and couldnt find an answer.
especially with some specific qustions about keys asked if it would improve
performance or not. please harp the stanard "its been answer before" routine
with me. technology changes. peole might have different opinions. the
searcher might might not have been able to find it !! give some benifit of
the doubt fore you flame.

[quoted text, click to view]
Re: fastest way to dedupe a list Jay
10/22/2007 11:12:00 AM
Erland,

I connect to USENET only through the NNTP server 'news.microsoft.com' and
therefore don't get the comp.databases.ms-sqlserver group (they only have
the microsoft.public.* groups).

Since I found it an interesting question, I would have appreciated at least
a Google link to the thread.

Thanks,
Jay


[quoted text, click to view]

Re: fastest way to dedupe a list Erland Sommarskog
10/22/2007 10:17:49 PM
Michael Evanchik (MichaelEvanchik@discussions.microsoft.com) writes:
[quoted text, click to view]

I specifically meant that you yourself had posted the question to a
different newsgroup, and I added my note in this newsgroup, so that
people would not spend time on a question that already had been answered.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Re: fastest way to dedupe a list Erland Sommarskog
10/22/2007 10:21:25 PM
Jay (nospan@nospam.org) writes:
[quoted text, click to view]

Let's see, then I should go to Google and find my post to be able to
post a link? Your own web browser is at the repair shop this week, or?

The main purpose of my note is to warn other people that they may not need
to spend time on answering the post.

My suggestion was that inserting all rows into table with a unique index and
the option IGNORE_DUP_KEYS could be the best bet.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
AddThis Social Bookmark Button