I have a table defined as : CREATE TABLE [dbo].[job_13_event] ( [id] [int] IDENTITY (1, 1) NOT NULL , [recipient_id] [int] NULL , [event_type] [int] NULL , [mail_type] [int] NULL , ) ON [PRIMARY] There is about 5 million rows in this table; 1,063,337 have event_type=3; When I run the folowing statement it takes 13 seconds to run. The execution plan shows a table spool/eager spool taking 32%. SELECT count(recipient_id) mail_type_count, count(distinct recipient_id) dist_count, mail_type from job_13_event where event_type = 3 GROUP BY mail_type When I run the two counts as seperate statements, the count returns in 2 seconds and the count distinct returns in 3. SELECT count(recipient_id) mail_type_count, mail_type from job_13_event where event_type = 3 GROUP BY mail_type SELECT count(distinct recipient_id) dist_count, mail_type from job_13_event where event_type = 3 GROUP BY mail_type If I do a join of the two statements and use FORCEPLAN, I can get the results to come back in 7 Seconds. Here is the sql for that: SET FORCEPLAN ON select a.dist_count, a.mail_type, b.mail_type_count from ( SELECT count(distinct recipient_id) dist_count, mail_type from job_13_event where event_type = 3 GROUP BY mail_type) a, ( SELECT count(1) mail_type_count, mail_type from job_13_event where event_type = 3 GROUP BY mail_type ) b where a.mail_type = b.mail_type I would think that the optimizer could calculate the first simple statment by : sorting by the mail_type and then recipient_id and doing a count for one and a distinct count for the recipient_id in one scan. In that case you would get the faster operation the count(1) essentially for free, resulting in a near 4 second result time. Is there something that I am missing that would make my first statement run as fast or faster than using the forceplan ??? Thanks. Shawn
Uri, The trick here is that the indexed view has already materialized. Thus, sqlserver will use this materialized plan instead of going back to the base table(s) and perform compilation/execution. So, it's really not the same if you compare the view's plan with the basetable's plan. my 2c. -- -oj http://www.rac4sql.net [quoted text, click to view] "Uri Dimant" <urid@iscar.co.il> wrote in message news:%232Ksgn%23CEHA.3796@TK2MSFTNGP10.phx.gbl... > Steve > As I mentioned yesterday I use this your script to demonstrate to our > develpers that a indexed view uses more efficient execution query plan. > Thank you. > > "Steve Kass" <skass@drew.edu> wrote in message > news:#khbCR#CEHA.2052@TK2MSFTNGP11.phx.gbl... > > Nice trick, Uri. ;) > > > > I'll add that Shawn isn't missing anything, in my opinion. Queries with > > both count(*) and count(distinct ...) are often real trouble for the > > optimizer, and some user intervention is probably necessary. The view > > trick will help with the count(*) and can be joined to the rest of the > > query, but it won't speed up the count(distinct..). I don't know if > > NOEXPAND and a join will force a good plan - my guess is it will help, > > but whether this solution is better than user intervention without a > > view depends on how often this query needs to be run and on how > > frequently there are inserts and deletes from the table. In any case, > > if Shawn uses this trick, the column to replace T.i should be the > > primary key column or any column that is never or very infrequently > > updated, so that only deletes and inserts will contribute to the overhead. > > > > SK > > > > Uri Dimant wrote: > > > > >Shawn > > >I'd recommend you create an indexed view . > > >create table T ( > > > i int > > >) > > >go > > > > > >create view T_count with schemabinding > > >as > > > select cast(i as bit) as val, count_big(*) T_count from dbo.T > > > group by cast(i as bit) > > >go > > > > > >create unique clustered index T_count_uci on T_count(val) > > >go > > >insert into T > > >select OrderID > > >from Northwind..[Order Details] > > >go > > > > > >select sum(T_count) from T_count with (noexpand) > > >-- uses an efficient query plan on the materialized view > > > > > >go > > >drop view T_count > > >drop table T > > > > > > > > >"Shawn Meyer" <smeyer@interelate.com> wrote in message > > >news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl... > > > > > > > > >>I have a table defined as : > > >> > > >>CREATE TABLE [dbo].[job_13_event] ( > > >> [id] [int] IDENTITY (1, 1) NOT NULL , > > >> [recipient_id] [int] NULL , > > >> [event_type] [int] NULL , > > >> [mail_type] [int] NULL , > > >>) ON [PRIMARY] > > >> > > >>There is about 5 million rows in this table; > > >>1,063,337 have event_type=3; > > >> > > >>When I run the folowing statement it takes 13 seconds > > >>to run. The execution plan shows a table spool/eager spool taking > > >>32%. > > >> > > >>SELECT > > >> count(recipient_id) mail_type_count, > > >> count(distinct recipient_id) dist_count, > > >> mail_type > > >>from > > >> job_13_event where event_type = 3 > > >>GROUP BY mail_type > > >> > > >>When I run the two counts as seperate statements, the count returns in 2 > > >>seconds > > >>and the count distinct returns in 3. > > >> > > >>SELECT > > >> count(recipient_id) mail_type_count, > > >> mail_type > > >>from > > >> job_13_event where event_type = 3 > > >>GROUP BY mail_type > > >> > > >>SELECT > > >> count(distinct recipient_id) dist_count, > > >> mail_type > > >>from > > >> job_13_event where event_type = 3 > > >>GROUP BY mail_type > > >> > > >>If I do a join of the two statements and use FORCEPLAN, > > >>I can get the results to come back in 7 Seconds. Here is the sql for > > >> > > >> > > >that: > > > > > > > > >>SET FORCEPLAN ON > > >>select a.dist_count, a.mail_type, b.mail_type_count from > > >>( SELECT > > >> count(distinct recipient_id) dist_count, > > >> mail_type > > >> from job_13_event > > >> where event_type = 3 > > >> GROUP BY mail_type) a, > > >>( SELECT > > >> count(1) mail_type_count, > > >> mail_type > > >> from job_13_event > > >> where event_type = 3 > > >> GROUP BY mail_type ) b > > >>where a.mail_type = b.mail_type > > >> > > >>I would think that the optimizer could calculate the first simple > statment > > >>by : > > >>sorting by the mail_type and then recipient_id and doing a count for one > > >> > > >> > > >and > > > > > > > > >>a distinct count for the recipient_id > > >>in one scan. In that case you would get the faster operation the > count(1) > > >>essentially for free, resulting in a near 4 second > > >>result time. > > >> > > >>Is there something that I am missing that would make my first statement > > >> > > >> > > >run > > > > > > > > >>as fast or faster than using > > >>the forceplan ??? > > >> > > >>Thanks. > > >>Shawn > > >> > > >> > > >> > > >> > > > > > > > > > > > > > > > >
Nice trick, Uri. ;) I'll add that Shawn isn't missing anything, in my opinion. Queries with both count(*) and count(distinct ...) are often real trouble for the optimizer, and some user intervention is probably necessary. The view trick will help with the count(*) and can be joined to the rest of the query, but it won't speed up the count(distinct..). I don't know if NOEXPAND and a join will force a good plan - my guess is it will help, but whether this solution is better than user intervention without a view depends on how often this query needs to be run and on how frequently there are inserts and deletes from the table. In any case, if Shawn uses this trick, the column to replace T.i should be the primary key column or any column that is never or very infrequently updated, so that only deletes and inserts will contribute to the overhead. SK [quoted text, click to view] Uri Dimant wrote: >Shawn >I'd recommend you create an indexed view . >create table T ( > i int >) >go > >create view T_count with schemabinding >as > select cast(i as bit) as val, count_big(*) T_count from dbo.T > group by cast(i as bit) >go > >create unique clustered index T_count_uci on T_count(val) >go >insert into T >select OrderID >from Northwind..[Order Details] >go > >select sum(T_count) from T_count with (noexpand) >-- uses an efficient query plan on the materialized view > >go >drop view T_count >drop table T > > >"Shawn Meyer" <smeyer@interelate.com> wrote in message >news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl... > > >>I have a table defined as : >> >>CREATE TABLE [dbo].[job_13_event] ( >> [id] [int] IDENTITY (1, 1) NOT NULL , >> [recipient_id] [int] NULL , >> [event_type] [int] NULL , >> [mail_type] [int] NULL , >>) ON [PRIMARY] >> >>There is about 5 million rows in this table; >>1,063,337 have event_type=3; >> >>When I run the folowing statement it takes 13 seconds >>to run. The execution plan shows a table spool/eager spool taking >>32%. >> >>SELECT >> count(recipient_id) mail_type_count, >> count(distinct recipient_id) dist_count, >> mail_type >>from >> job_13_event where event_type = 3 >>GROUP BY mail_type >> >>When I run the two counts as seperate statements, the count returns in 2 >>seconds >>and the count distinct returns in 3. >> >>SELECT >> count(recipient_id) mail_type_count, >> mail_type >>from >> job_13_event where event_type = 3 >>GROUP BY mail_type >> >>SELECT >> count(distinct recipient_id) dist_count, >> mail_type >>from >> job_13_event where event_type = 3 >>GROUP BY mail_type >> >>If I do a join of the two statements and use FORCEPLAN, >>I can get the results to come back in 7 Seconds. Here is the sql for >> >> >that: > > >>SET FORCEPLAN ON >>select a.dist_count, a.mail_type, b.mail_type_count from >>( SELECT >> count(distinct recipient_id) dist_count, >> mail_type >> from job_13_event >> where event_type = 3 >> GROUP BY mail_type) a, >>( SELECT >> count(1) mail_type_count, >> mail_type >> from job_13_event >> where event_type = 3 >> GROUP BY mail_type ) b >>where a.mail_type = b.mail_type >> >>I would think that the optimizer could calculate the first simple statment >>by : >>sorting by the mail_type and then recipient_id and doing a count for one >> >> >and > > >>a distinct count for the recipient_id >>in one scan. In that case you would get the faster operation the count(1) >>essentially for free, resulting in a near 4 second >>result time. >> >>Is there something that I am missing that would make my first statement >> >> >run > > >>as fast or faster than using >>the forceplan ??? >> >>Thanks. >>Shawn >> >> >> >> > > > >
I was actually hoping that Uri would mention that the code he posted was mine and not his idea. http://groups.google.com/groups?q=%22cast+i+as+bit SK [quoted text, click to view] oj wrote: >Uri, > >The trick here is that the indexed view has already materialized. Thus, >sqlserver will use this materialized plan instead of going back to the base >table(s) and perform compilation/execution. So, it's really not the same if you >compare the view's plan with the basetable's plan. my 2c. > > >
Shawn I'd recommend you create an indexed view . create table T ( i int ) go create view T_count with schemabinding as select cast(i as bit) as val, count_big(*) T_count from dbo.T group by cast(i as bit) go create unique clustered index T_count_uci on T_count(val) go insert into T select OrderID from Northwind..[Order Details] go select sum(T_count) from T_count with (noexpand) -- uses an efficient query plan on the materialized view go drop view T_count drop table T [quoted text, click to view] "Shawn Meyer" <smeyer@interelate.com> wrote in message news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl... > I have a table defined as : > > CREATE TABLE [dbo].[job_13_event] ( > [id] [int] IDENTITY (1, 1) NOT NULL , > [recipient_id] [int] NULL , > [event_type] [int] NULL , > [mail_type] [int] NULL , > ) ON [PRIMARY] > > There is about 5 million rows in this table; > 1,063,337 have event_type=3; > > When I run the folowing statement it takes 13 seconds > to run. The execution plan shows a table spool/eager spool taking > 32%. > > SELECT > count(recipient_id) mail_type_count, > count(distinct recipient_id) dist_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > When I run the two counts as seperate statements, the count returns in 2 > seconds > and the count distinct returns in 3. > > SELECT > count(recipient_id) mail_type_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > SELECT > count(distinct recipient_id) dist_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > If I do a join of the two statements and use FORCEPLAN, > I can get the results to come back in 7 Seconds. Here is the sql for that: > > SET FORCEPLAN ON > select a.dist_count, a.mail_type, b.mail_type_count from > ( SELECT > count(distinct recipient_id) dist_count, > mail_type > from job_13_event > where event_type = 3 > GROUP BY mail_type) a, > ( SELECT > count(1) mail_type_count, > mail_type > from job_13_event > where event_type = 3 > GROUP BY mail_type ) b > where a.mail_type = b.mail_type > > I would think that the optimizer could calculate the first simple statment > by : > sorting by the mail_type and then recipient_id and doing a count for one and > a distinct count for the recipient_id > in one scan. In that case you would get the faster operation the count(1) > essentially for free, resulting in a near 4 second > result time. > > Is there something that I am missing that would make my first statement run > as fast or faster than using > the forceplan ??? > > Thanks. > Shawn > >
Steve As I mentioned yesterday I use this your script to demonstrate to our develpers that a indexed view uses more efficient execution query plan. Thank you. [quoted text, click to view] "Steve Kass" <skass@drew.edu> wrote in message news:#khbCR#CEHA.2052@TK2MSFTNGP11.phx.gbl... > Nice trick, Uri. ;) > > I'll add that Shawn isn't missing anything, in my opinion. Queries with > both count(*) and count(distinct ...) are often real trouble for the > optimizer, and some user intervention is probably necessary. The view > trick will help with the count(*) and can be joined to the rest of the > query, but it won't speed up the count(distinct..). I don't know if > NOEXPAND and a join will force a good plan - my guess is it will help, > but whether this solution is better than user intervention without a > view depends on how often this query needs to be run and on how > frequently there are inserts and deletes from the table. In any case, > if Shawn uses this trick, the column to replace T.i should be the > primary key column or any column that is never or very infrequently > updated, so that only deletes and inserts will contribute to the overhead. > > SK > > Uri Dimant wrote: > > >Shawn > >I'd recommend you create an indexed view . > >create table T ( > > i int > >) > >go > > > >create view T_count with schemabinding > >as > > select cast(i as bit) as val, count_big(*) T_count from dbo.T > > group by cast(i as bit) > >go > > > >create unique clustered index T_count_uci on T_count(val) > >go > >insert into T > >select OrderID > >from Northwind..[Order Details] > >go > > > >select sum(T_count) from T_count with (noexpand) > >-- uses an efficient query plan on the materialized view > > > >go > >drop view T_count > >drop table T > > > > > >"Shawn Meyer" <smeyer@interelate.com> wrote in message > >news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl... > > > > > >>I have a table defined as : > >> > >>CREATE TABLE [dbo].[job_13_event] ( > >> [id] [int] IDENTITY (1, 1) NOT NULL , > >> [recipient_id] [int] NULL , > >> [event_type] [int] NULL , > >> [mail_type] [int] NULL , > >>) ON [PRIMARY] > >> > >>There is about 5 million rows in this table; > >>1,063,337 have event_type=3; > >> > >>When I run the folowing statement it takes 13 seconds > >>to run. The execution plan shows a table spool/eager spool taking > >>32%. > >> > >>SELECT > >> count(recipient_id) mail_type_count, > >> count(distinct recipient_id) dist_count, > >> mail_type > >>from > >> job_13_event where event_type = 3 > >>GROUP BY mail_type > >> > >>When I run the two counts as seperate statements, the count returns in 2 > >>seconds > >>and the count distinct returns in 3. > >> > >>SELECT > >> count(recipient_id) mail_type_count, > >> mail_type > >>from > >> job_13_event where event_type = 3 > >>GROUP BY mail_type > >> > >>SELECT > >> count(distinct recipient_id) dist_count, > >> mail_type > >>from > >> job_13_event where event_type = 3 > >>GROUP BY mail_type > >> > >>If I do a join of the two statements and use FORCEPLAN, > >>I can get the results to come back in 7 Seconds. Here is the sql for > >> > >> > >that: > > > > > >>SET FORCEPLAN ON > >>select a.dist_count, a.mail_type, b.mail_type_count from > >>( SELECT > >> count(distinct recipient_id) dist_count, > >> mail_type > >> from job_13_event > >> where event_type = 3 > >> GROUP BY mail_type) a, > >>( SELECT > >> count(1) mail_type_count, > >> mail_type > >> from job_13_event > >> where event_type = 3 > >> GROUP BY mail_type ) b > >>where a.mail_type = b.mail_type > >> > >>I would think that the optimizer could calculate the first simple statment > >>by : > >>sorting by the mail_type and then recipient_id and doing a count for one > >> > >> > >and > > > > > >>a distinct count for the recipient_id > >>in one scan. In that case you would get the faster operation the count(1) > >>essentially for free, resulting in a near 4 second > >>result time. > >> > >>Is there something that I am missing that would make my first statement > >> > >> > >run > > > > > >>as fast or faster than using > >>the forceplan ??? > >> > >>Thanks. > >>Shawn > >> > >> > >> > >> > > > > > > > > >
OJ You are right. I was talking about an indexed views and did not mention about 'normal' views .. [quoted text, click to view] "oj" <nospam_ojngo@home.com> wrote in message news:O0rYmv#CEHA.3256@TK2MSFTNGP09.phx.gbl... > Uri, > > The trick here is that the indexed view has already materialized. Thus, > sqlserver will use this materialized plan instead of going back to the base > table(s) and perform compilation/execution. So, it's really not the same if you > compare the view's plan with the basetable's plan. my 2c. > > -- > -oj > http://www.rac4sql.net > > > "Uri Dimant" <urid@iscar.co.il> wrote in message > news:%232Ksgn%23CEHA.3796@TK2MSFTNGP10.phx.gbl... > > Steve > > As I mentioned yesterday I use this your script to demonstrate to our > > develpers that a indexed view uses more efficient execution query plan. > > Thank you. > > > > "Steve Kass" <skass@drew.edu> wrote in message > > news:#khbCR#CEHA.2052@TK2MSFTNGP11.phx.gbl... > > > Nice trick, Uri. ;) > > > > > > I'll add that Shawn isn't missing anything, in my opinion. Queries with > > > both count(*) and count(distinct ...) are often real trouble for the > > > optimizer, and some user intervention is probably necessary. The view > > > trick will help with the count(*) and can be joined to the rest of the > > > query, but it won't speed up the count(distinct..). I don't know if > > > NOEXPAND and a join will force a good plan - my guess is it will help, > > > but whether this solution is better than user intervention without a > > > view depends on how often this query needs to be run and on how > > > frequently there are inserts and deletes from the table. In any case, > > > if Shawn uses this trick, the column to replace T.i should be the > > > primary key column or any column that is never or very infrequently > > > updated, so that only deletes and inserts will contribute to the overhead. > > > > > > SK > > > > > > Uri Dimant wrote: > > > > > > >Shawn > > > >I'd recommend you create an indexed view . > > > >create table T ( > > > > i int > > > >) > > > >go > > > > > > > >create view T_count with schemabinding > > > >as > > > > select cast(i as bit) as val, count_big(*) T_count from dbo.T > > > > group by cast(i as bit) > > > >go > > > > > > > >create unique clustered index T_count_uci on T_count(val) > > > >go > > > >insert into T > > > >select OrderID > > > >from Northwind..[Order Details] > > > >go > > > > > > > >select sum(T_count) from T_count with (noexpand) > > > >-- uses an efficient query plan on the materialized view > > > > > > > >go > > > >drop view T_count > > > >drop table T > > > > > > > > > > > >"Shawn Meyer" <smeyer@interelate.com> wrote in message > > > >news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl... > > > > > > > > > > > >>I have a table defined as : > > > >> > > > >>CREATE TABLE [dbo].[job_13_event] ( > > > >> [id] [int] IDENTITY (1, 1) NOT NULL , > > > >> [recipient_id] [int] NULL , > > > >> [event_type] [int] NULL , > > > >> [mail_type] [int] NULL , > > > >>) ON [PRIMARY] > > > >> > > > >>There is about 5 million rows in this table; > > > >>1,063,337 have event_type=3; > > > >> > > > >>When I run the folowing statement it takes 13 seconds > > > >>to run. The execution plan shows a table spool/eager spool taking > > > >>32%. > > > >> > > > >>SELECT > > > >> count(recipient_id) mail_type_count, > > > >> count(distinct recipient_id) dist_count, > > > >> mail_type > > > >>from > > > >> job_13_event where event_type = 3 > > > >>GROUP BY mail_type > > > >> > > > >>When I run the two counts as seperate statements, the count returns in 2 > > > >>seconds > > > >>and the count distinct returns in 3. > > > >> > > > >>SELECT > > > >> count(recipient_id) mail_type_count, > > > >> mail_type > > > >>from > > > >> job_13_event where event_type = 3 > > > >>GROUP BY mail_type > > > >> > > > >>SELECT > > > >> count(distinct recipient_id) dist_count, > > > >> mail_type > > > >>from > > > >> job_13_event where event_type = 3 > > > >>GROUP BY mail_type > > > >> > > > >>If I do a join of the two statements and use FORCEPLAN, > > > >>I can get the results to come back in 7 Seconds. Here is the sql for > > > >> > > > >> > > > >that: > > > > > > > > > > > >>SET FORCEPLAN ON > > > >>select a.dist_count, a.mail_type, b.mail_type_count from > > > >>( SELECT > > > >> count(distinct recipient_id) dist_count, > > > >> mail_type > > > >> from job_13_event > > > >> where event_type = 3 > > > >> GROUP BY mail_type) a, > > > >>( SELECT > > > >> count(1) mail_type_count, > > > >> mail_type > > > >> from job_13_event > > > >> where event_type = 3 > > > >> GROUP BY mail_type ) b > > > >>where a.mail_type = b.mail_type > > > >> > > > >>I would think that the optimizer could calculate the first simple > > statment > > > >>by : > > > >>sorting by the mail_type and then recipient_id and doing a count for one > > > >> > > > >> > > > >and > > > > > > > > > > > >>a distinct count for the recipient_id > > > >>in one scan. In that case you would get the faster operation the > > count(1) > > > >>essentially for free, resulting in a near 4 second > > > >>result time. > > > >> > > > >>Is there something that I am missing that would make my first statement > > > >> > > > >> > > > >run > > > > > > > > > > > >>as fast or faster than using > > > >>the forceplan ??? > > > >> > > > >>Thanks. > > > >>Shawn > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > > > > > > > > >
Uri, You didn't declare it as your own, but I think it is polite to mention the source when you cut and paste someone else's idea like you did. Steve [quoted text, click to view] Uri Dimant wrote: >Steve >Don't understand you >Did you mean that i am stolen your code and declared it as my own? > > > > >"Steve Kass" <skass@drew.edu> wrote in message >news:upPtToBDEHA.688@tk2msftngp13.phx.gbl... > > >>I was actually hoping that Uri would mention that the code he posted was >>mine and not his idea. >> http://groups.google.com/groups?q=%22cast+i+as+bit >> >>SK >> >>oj wrote: >> >> >> >>>Uri, >>> >>>The trick here is that the indexed view has already materialized. Thus, >>>sqlserver will use this materialized plan instead of going back to the >>> >>> >base > > >>>table(s) and perform compilation/execution. So, it's really not the same >>> >>> >if you > > >>>compare the view's plan with the basetable's plan. my 2c. >>> >>> >>> >>> >>> > > > >
Steve Don't understand you Did you mean that i am stolen your code and declared it as my own? [quoted text, click to view] "Steve Kass" <skass@drew.edu> wrote in message news:upPtToBDEHA.688@tk2msftngp13.phx.gbl... > I was actually hoping that Uri would mention that the code he posted was > mine and not his idea. > http://groups.google.com/groups?q=%22cast+i+as+bit > > SK > > oj wrote: > > >Uri, > > > >The trick here is that the indexed view has already materialized. Thus, > >sqlserver will use this materialized plan instead of going back to the base > >table(s) and perform compilation/execution. So, it's really not the same if you > >compare the view's plan with the basetable's plan. my 2c. > > > > > > >
Shawn, The following rewrite of the statement might perform considerably better. I agree that the optimizer can always be improved, even in relatively complex situation as the one you describe. SELECT SUM(T1.RecipientCount) AS mail_type_count , COUNT(*) AS dist_count , T1.mail_type FROM ( SELECT mail_type,recipient_id,COUNT(*) AS RecipientCount FROM job_13_event WHERE event_type=3 AND recipient_id IS NOT NULL GROUP BY mail_type,recipient_id ) AS T1 GROUP BY T1.mail_type Hope this helps, Gert-Jan [quoted text, click to view] Shawn Meyer wrote: > > I have a table defined as : > > CREATE TABLE [dbo].[job_13_event] ( > [id] [int] IDENTITY (1, 1) NOT NULL , > [recipient_id] [int] NULL , > [event_type] [int] NULL , > [mail_type] [int] NULL , > ) ON [PRIMARY] > > There is about 5 million rows in this table; > 1,063,337 have event_type=3; > > When I run the folowing statement it takes 13 seconds > to run. The execution plan shows a table spool/eager spool taking > 32%. > > SELECT > count(recipient_id) mail_type_count, > count(distinct recipient_id) dist_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > When I run the two counts as seperate statements, the count returns in 2 > seconds > and the count distinct returns in 3. > > SELECT > count(recipient_id) mail_type_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > SELECT > count(distinct recipient_id) dist_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > If I do a join of the two statements and use FORCEPLAN, > I can get the results to come back in 7 Seconds. Here is the sql for that: > > SET FORCEPLAN ON > select a.dist_count, a.mail_type, b.mail_type_count from > ( SELECT > count(distinct recipient_id) dist_count, > mail_type > from job_13_event > where event_type = 3 > GROUP BY mail_type) a, > ( SELECT > count(1) mail_type_count, > mail_type > from job_13_event > where event_type = 3 > GROUP BY mail_type ) b > where a.mail_type = b.mail_type > > I would think that the optimizer could calculate the first simple statment > by : > sorting by the mail_type and then recipient_id and doing a count for one and > a distinct count for the recipient_id > in one scan. In that case you would get the faster operation the count(1) > essentially for free, resulting in a near 4 second > result time. > > Is there something that I am missing that would make my first statement run > as fast or faster than using > the forceplan ??? > > Thanks. > Shawn
--
Steve I got it. [quoted text, click to view] "Steve Kass" <skass@drew.edu> wrote in message news:O$Usd7DDEHA.3404@TK2MSFTNGP10.phx.gbl... > Uri, > > You didn't declare it as your own, but I think it is polite to mention > the source when you cut and paste someone else's idea like you did. > > Steve > > > Uri Dimant wrote: > > >Steve > >Don't understand you > >Did you mean that i am stolen your code and declared it as my own? > > > > > > > > > >"Steve Kass" <skass@drew.edu> wrote in message > >news:upPtToBDEHA.688@tk2msftngp13.phx.gbl... > > > > > >>I was actually hoping that Uri would mention that the code he posted was > >>mine and not his idea. > >> http://groups.google.com/groups?q=%22cast+i+as+bit > >> > >>SK > >> > >>oj wrote: > >> > >> > >> > >>>Uri, > >>> > >>>The trick here is that the indexed view has already materialized. Thus, > >>>sqlserver will use this materialized plan instead of going back to the > >>> > >>> > >base > > > > > >>>table(s) and perform compilation/execution. So, it's really not the same > >>> > >>> > >if you > > > > > >>>compare the view's plan with the basetable's plan. my 2c. > >>> > >>> > >>> > >>> > >>> > > > > > > > > >
The query shown below should be faster. -- the table --drop table job_13_event CREATE TABLE [dbo].[job_13_event] ( [id] [int] IDENTITY (1, 1) NOT NULL , [recipient_id] [int] NULL , [event_type] [int] NULL , [mail_type] [int] NULL , ) ON [PRIMARY] -- some raw data insert into job_13_event (recipient_id,event_type,mail_type) values (null,null,null) insert into job_13_event (recipient_id,event_type,mail_type) values (100,1,300) insert into job_13_event (recipient_id,event_type,mail_type) values (111,3,310) insert into job_13_event (recipient_id,event_type,mail_type) values (111,3,310) insert into job_13_event (recipient_id,event_type,mail_type) values (112,3,311) insert into job_13_event (recipient_id,event_type,mail_type) values (121,3,320) insert into job_13_event (recipient_id,event_type,mail_type) values (122,3,320) insert into job_13_event (recipient_id,event_type,mail_type) values (122,3,320) insert into job_13_event (recipient_id,event_type,mail_type) values (122,3,320) -- peek at the raw data in the table select * from job_13_event order by mail_type, recipient_id -- sad query SELECT count(recipient_id) mail_type_count, count(distinct recipient_id) dist_count, mail_type from job_13_event where event_type = 3 GROUP BY mail_type ORDER BY mail_type -- happy query -- But not joyful due to -- second sort operator which seems to be unnecessary. SELECT sum(cnt)/2 AS mail_type_count, sum(1-G_recipient_id) AS dist_count, mail_type FROM ( SELECT GROUPING(mail_type) AS G_mail_type, mail_type, GROUPING(recipient_id) AS G_recipient_id, recipient_id, count(*) AS cnt FROM job_13_event WHERE event_type = 3 AND recipient_id IS NOT NULL GROUP BY mail_type, recipient_id WITH ROLLUP ) DT WHERE G_mail_type = 0 GROUP BY mail_type ORDER BY mail_type Bye, Delbert Glass [quoted text, click to view] "Shawn Meyer" <smeyer@interelate.com> wrote in message news:eKMZoO9CEHA.3256@TK2MSFTNGP09.phx.gbl... > I have a table defined as : > > CREATE TABLE [dbo].[job_13_event] ( > [id] [int] IDENTITY (1, 1) NOT NULL , > [recipient_id] [int] NULL , > [event_type] [int] NULL , > [mail_type] [int] NULL , > ) ON [PRIMARY] > > There is about 5 million rows in this table; > 1,063,337 have event_type=3; > > When I run the folowing statement it takes 13 seconds > to run. The execution plan shows a table spool/eager spool taking > 32%. > > SELECT > count(recipient_id) mail_type_count, > count(distinct recipient_id) dist_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > When I run the two counts as seperate statements, the count returns in 2 > seconds > and the count distinct returns in 3. > > SELECT > count(recipient_id) mail_type_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > SELECT > count(distinct recipient_id) dist_count, > mail_type > from > job_13_event where event_type = 3 > GROUP BY mail_type > > If I do a join of the two statements and use FORCEPLAN, > I can get the results to come back in 7 Seconds. Here is the sql for that: > > SET FORCEPLAN ON > select a.dist_count, a.mail_type, b.mail_type_count from > ( SELECT > count(distinct recipient_id) dist_count, > mail_type > from job_13_event > where event_type = 3 > GROUP BY mail_type) a, > ( SELECT > count(1) mail_type_count, > mail_type > from job_13_event > where event_type = 3 > GROUP BY mail_type ) b > where a.mail_type = b.mail_type > > I would think that the optimizer could calculate the first simple statment > by : > sorting by the mail_type and then recipient_id and doing a count for one and > a distinct count for the recipient_id > in one scan. In that case you would get the faster operation the count(1) > essentially for free, resulting in a near 4 second > result time. > > Is there something that I am missing that would make my first statement run > as fast or faster than using > the forceplan ??? > > Thanks. > Shawn > >
[quoted text, click to view] >I got it.
(letting "it" refer to the wrong thing) We know you have it ;-) Bye, Delbert Glass
Thanks for your help. [quoted text, click to view] "Delbert Glass" <delbert@noincoming.com> wrote in message news:Ot0znoQDEHA.628@TK2MSFTNGP10.phx.gbl... > >I got it. > > (letting "it" refer to the wrong thing) > We know you have it ;-) > > Bye, > Delbert Glass > >
Don't see what you're looking for? Try a search.
|