Groups | Blog | Home
all groups > sql server data warehouse > may 2005 >

sql server data warehouse : SQL Server 2000 Query tuning question


MattODA
5/10/2005 8:46:02 PM
I have a star schema where the fact table is about 1.2M rows, and about 40
measures wide (a lot, I know). I've been tuning and studying Showplan and
have performance almost where I need it. But I'm encountering something I
don't understand. When I run this query:

select REGION_NAME,
Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end),
Sum(case when 8 = 8 then TOTAL_SALES else null end),
Sum(case when 9 = 9 then TOTAL_SALES else null end),
Sum(case when 10 = 10 then TOTAL_SALES else null end),
Sum(case when 11 = 11 then TOTAL_SALES else null end),
Sum(case when 12 = 12 then TOTAL_SALES else null end),
SUM((TOTAL_SALES))
from DateDim DD JOIN StoreDayFact SDF
on SDF.DateDimKey = DD.DateDimKey
INNER JOIN StoreDim SD
on SDF.StoreDimKey = SD.StoreDimKey
where DD.CalendarYear = 2005
group by REGION_NAME

it returns in 6 seconds. When I modify it slightly:

select REGION_NAME,
Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end),
-- Sum(case when 8 = 8 then TOTAL_SALES else null end),
Sum(case when CalendarMonth = 8 then TOTAL_SALES else null end),
Sum(case when 9 = 9 then TOTAL_SALES else null end),
Sum(case when 10 = 10 then TOTAL_SALES else null end),
Sum(case when 11 = 11 then TOTAL_SALES else null end),
Sum(case when 12 = 12 then TOTAL_SALES else null end),
SUM((TOTAL_SALES))
from ODA_DateDim DD JOIN ODA_StoreDayFact SDF
on SDF.DateDimKey = DD.DateDimKey
INNER JOIN ODA_StoreDim SD
on SDF.StoreDimKey = SD.StoreDimKey
where DD.CalendarYear = 2005
group by REGION_NAME

it takes 15 seconds. The "strange" thing to me is, the Execution plan for
both versions of the query are identical, as are the Stats IO output (i.e.
same number of logical reads for both queries). Yet, one takes more than
twice the other to return.

What is happening? More important, is there anything I can do to further
tune Query #2?

Thanks!!
MattODA
5/12/2005 10:41:14 AM
Yeah, my bad. That was a typo. The two queries do hit the same tables. I did
a DBCC CheckDB and Shrink DB just for fun this morning, and it changed the
times a little bit. But there's still a pretty dramatic "step" in performance
of this query.

Here is that query again. I started by commenting out the lines for months 2
- 12. Then I uncommented one at a time and ran the query a few times. Next to
each is the fastest time I could achieve. The Execution Plan and Logical
Reads were identical every time.

select REGION_NAME,
Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end), -- 3 secs
Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end), -- 3 secs
Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end), -- 4 secs
Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end), -- 4 secs
Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end), -- 5 secs
Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end), -- 5 secs
Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end), -- 6 secs
Sum(case when CalendarMonth = 8 then TOTAL_SALES else null end), -- 6 secs
Sum(case when CalendarMonth = 9 then TOTAL_SALES else null end), -- 7 secs
Sum(case when CalendarMonth = 10 then TOTAL_SALES else null end), -- 7 secs
Sum(case when CalendarMonth = 11 then TOTAL_SALES else null end), -- 11
secs
Sum(case when CalendarMonth = 12 then TOTAL_SALES else null end), -- 17
secs
SUM((TOTAL_SALES))
from DateDim DD JOIN StoreDayFact SDF
on SDF.DateDimKey = DD.DateDimKey
INNER JOIN StoreDim SD
on SDF.StoreDimKey = SD.StoreDimKey
where DD.CalendarYear = 2005
group by REGION_NAME

What do you think could be going on?
Thanks,
-Matt


[quoted text, click to view]
MattODA
5/12/2005 10:48:07 AM
Yeah, my bad, that was a typo. Those two queries do indeed hit the same
tables. I ran a DBCC CheckDB and Shrink Files this morning for fun, and it
changed the overall response times a little bit. But there is still a pretty
dramatic "step" in performance. I also ran Update Stats, with no effect.

Here's that query again. I started by commenting out the lines for months
2-12, then uncommented one line at a time and ran the query a few times. Next
to each line is the fastest response time I was able to get with that line
uncommented. The Execution Plan and Logical Reads were identical every time.

select REGION_NAME,
Sum(case when CalendarMonth = 1 then TOTAL_SALES else null end), -- 3 secs
Sum(case when CalendarMonth = 2 then TOTAL_SALES else null end), -- 3 secs
Sum(case when CalendarMonth = 3 then TOTAL_SALES else null end), -- 4 secs
Sum(case when CalendarMonth = 4 then TOTAL_SALES else null end), -- 4 secs
Sum(case when CalendarMonth = 5 then TOTAL_SALES else null end), -- 5 secs
Sum(case when CalendarMonth = 6 then TOTAL_SALES else null end), -- 5 secs
Sum(case when CalendarMonth = 7 then TOTAL_SALES else null end), -- 6 secs
Sum(case when CalendarMonth = 8 then TOTAL_SALES else null end), -- 6 secs
Sum(case when CalendarMonth = 9 then TOTAL_SALES else null end), -- 7 secs
Sum(case when CalendarMonth = 10 then TOTAL_SALES else null end), -- 7 secs
Sum(case when CalendarMonth = 11 then TOTAL_SALES else null end), -- 11
secs
Sum(case when CalendarMonth = 12 then TOTAL_SALES else null end), -- 17
secs
SUM((TOTAL_SALES))
from DateDim DD JOIN StoreDayFact SDF
on SDF.DateDimKey = DD.DateDimKey
INNER JOIN StoreDim SD
on SDF.StoreDimKey = SD.StoreDimKey
where DD.CalendarYear = 2005
group by REGION_NAME

StoreDayFact: 1,279,713 rows
DateDim: 845 rows
StoreDim: 2816 rows

What do you think could be going on?
Thanks,
-Matt


[quoted text, click to view]
Adam Machanic
5/12/2005 12:54:20 PM
[quoted text, click to view]

Those queries are using different tables. The first query uses tables
DateDim and StoreFactDay, whereas the second uses ODA_DateDim and
ODA_StoreFactDay. Do these tables have the same indexes? Same row counts?
Same data distribution? Are statistics updated for the latter set of
tables?


--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--

AddThis Social Bookmark Button