[quoted text, click to view] > SELECT count(*),A.B,C,D > FROM tbl > GROUP BY A,B,C,D > HAVING count(*)>1
If that query takes 2 weeks to run then you've got serious and I mean serious problems with your hardware, that type of query should take minutes if not seconds most desktops let alone servers. To aid performance for that specific query create your clustered index on A, B, C and D. -- Tony Rogerson SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL Server Consultant http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "groupy" <liav.ezer@gmail.com> wrote in message news:1149065676.453810.137020@i39g2000cwa.googlegroups.com... > Ok, here is a asample table representing the problem more clearly > > A | B | C | D > ----------------- > a1 b1 c1 d1 > a1 b2 c2 d2 > a3 b3 c1 d3 > a4 b4 c4 d3 > a5 b5 c5 d5 > a6 b6 c6 d3 > > Tha duplications are: > row 1+2 in param A > row 1+3 in param C > row 3+4+6 in param D > only row 5 is unique in all parameters. > conclusion: row 1+2+3+4+6 are the same user > goal: to find all duplicated rows & to delete them all accept one > instance to leave. > > Note: > Finding that row 1similar to 2 in A & deleting it will loose data > because we won't know that row 1 is ALSO similar to 3 on C & later on > finding that 3 is similar to 4 & 6 on D & so on > > The simple time consuming (about 2 weaks) query to acomplish the task > is: > SELECT count(*),A.B,C,D > FROM tbl > GROUP BY A,B,C,D > HAVING count(*)>1 > > I THANK YOU ALL >
Ok, here is a asample table representing the problem more clearly A | B | C | D ----------------- a1 b1 c1 d1 a1 b2 c2 d2 a3 b3 c1 d3 a4 b4 c4 d3 a5 b5 c5 d5 a6 b6 c6 d3 Tha duplications are: row 1+2 in param A row 1+3 in param C row 3+4+6 in param D only row 5 is unique in all parameters. conclusion: row 1+2+3+4+6 are the same user goal: to find all duplicated rows & to delete them all accept one instance to leave. Note: Finding that row 1similar to 2 in A & deleting it will loose data because we won't know that row 1 is ALSO similar to 3 on C & later on finding that 3 is similar to 4 & 6 on D & so on The simple time consuming (about 2 weaks) query to acomplish the task is: SELECT count(*),A.B,C,D FROM tbl GROUP BY A,B,C,D HAVING count(*)>1 I THANK YOU ALL
Thanks Tony i've just managed something.. Tony Rogerson =D7=9B=D7=AA=D7=91: [quoted text, click to view] > > SELECT count(*),A.B,C,D > > FROM tbl > > GROUP BY A,B,C,D > > HAVING count(*)>1 > > If that query takes 2 weeks to run then you've got serious and I mean > serious problems with your hardware, that type of query should take minut= es > if not seconds most desktops let alone servers. > > To aid performance for that specific query create your clustered index on= A, > B, C and D. > > -- > Tony Rogerson > SQL Server MVP > http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a = SQL > Server Consultant > http://sqlserverfaq.com - free video tutorials > > > "groupy" <liav.ezer@gmail.com> wrote in message > news:1149065676.453810.137020@i39g2000cwa.googlegroups.com... > > Ok, here is a asample table representing the problem more clearly > > > > A | B | C | D > > ----------------- > > a1 b1 c1 d1 > > a1 b2 c2 d2 > > a3 b3 c1 d3 > > a4 b4 c4 d3 > > a5 b5 c5 d5 > > a6 b6 c6 d3 > > > > Tha duplications are: > > row 1+2 in param A > > row 1+3 in param C > > row 3+4+6 in param D > > only row 5 is unique in all parameters. > > conclusion: row 1+2+3+4+6 are the same user > > goal: to find all duplicated rows & to delete them all accept one > > instance to leave. > > > > Note: > > Finding that row 1similar to 2 in A & deleting it will loose data > > because we won't know that row 1 is ALSO similar to 3 on C & later on > > finding that 3 is similar to 4 & 6 on D & so on > > > > The simple time consuming (about 2 weaks) query to acomplish the task > > is: > > SELECT count(*),A.B,C,D > > FROM tbl > > GROUP BY A,B,C,D > > HAVING count(*)>1 > > > > I THANK YOU ALL > >
A | B | C | D ----------------- a1 b1 c1 d1 a1 b2 c2 d2 a3 b3 c1 d3 a4 b4 c4 d3 a5 b5 c5 d5 a6 b6 c6 d3 I am going to guess at the DDL and add another column CREATE TABLE Foobar (a CHAR(2) NOT NULL, b CHAR(2) NOT NULL, c CHAR(2) NOT NULL, d CHAR(2) NOT NULL, dups INTEGER DEFAULT 0 NOT NULL CHECK(dups >= 0), PRIMARY KEY(a, b, c, d)); INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b1', 'c1', 'd1'); INSERT INTO Foobar(a, b, c, d) VALUES ('a1', 'b2', 'c2', 'd2'); INSERT INTO Foobar(a, b, c, d) VALUES ('a3', 'b3', 'c1', 'd3'); INSERT INTO Foobar(a, b, c, d) VALUES ('a4', 'b4', 'c4', 'd3'); INSERT INTO Foobar(a, b, c, d) VALUES ('a5', 'b5', 'c5', 'd5'); INSERT INTO Foobar(a, b, c, d) VALUES ('a6', 'b6', 'c6', 'd3'); Since you seem to want to preserve some of the information about duplications, you can keep a tally UPDATE Foobar SET dups = dups + (SELECT 1 FROM Foobar AS F1 WHERE F1.a = Foobar.a HAVING COUNT(*) > 1) + (SELECT 1 FROM Foobar AS F1 WHERE F1.b = Foobar.b HAVING COUNT(*) > 1) + (SELECT 1 FROM Foobar AS F1 WHERE F1.c = Foobar.c HAVING COUNT(*) > 1) + (SELECT 1 FROM Foobar AS F1 WHERE F1.d = Foobar.d HAVING COUNT(*) > 1); The duplications are: row 1+2 in param A row 1+3 in param C row 3+4+6 in param D only row 5 is unique in all parameters. conclusion: row 1+2+3+4+6 are the same user
Don't see what you're looking for? Try a search.
|