Hi there,
Need a little help with a certain query that's causing a lot of acid
in my stomach...
Have a table that stores sales measures for a given client. The sales
measures are stored per year and there could be multiple sales
measures every year per client. There is another field called last
update date. If there are multiple sales measures then need to select
the one that's been entered last based on this field. Also, if there
is no sales measure data for current year then I need to return the
last year's data for which sales measure has been entered. For
example: if client #1 has sales measure value of $200 for 1999 and
nothing since, then I need to return $200 for any year following 1999.
So the query would look something like this:
SELECT client_name, sm_dollars FROM <tables>
Based on the DDL at the bottom I would expect to get back: c1, 100;
c2, 200
The way I am doing it now is with correlated subqueries (3 to be
exact) that each do an aggregate and join back to the original table.
It works, but it is notoriously slow. SQL Server is scanning the
index and does a merge join which in a large query takes %95 of the
time. Here is the part of the query plan for it:
| | | | | | |--Merge
Join(Inner Join, MANY-TO-MANY
MERGE:([sales_measure].[client_id])=([sales_measure].[client_id]),RESIDUAL:(([sales_measure].[client_id]=[sales_measure].[client_id]
AND [sales_measure].[tax_year]=[sales_measure].[tax_year]) AND
[Expr1013]=[sales_measure].[last_update_date]))
| | | | | | |--Stream
Aggregate(GROUP BY:([sales_measure].[client_id],
[sales_measure].[tax_year])
DEFINE:([Expr1013]=MAX([sales_measure].[last_update_date])))
| | | | | | | |--Merge
Join(Inner Join, MERGE:([sales_measure].[client_id],
[Expr1010])=([sales_measure].[client_id], [sales_measure].[tax_year]),
RESIDUAL:([sales_measure].[client_id]=[sales_measure].[client_id] AND
[sales_measure].[tax_year]=[Expr1010]))
| | | | | | |
|--Stream Aggregate(GROUP BY:([sales_measure].[client_id])
DEFINE:([Expr1010]=MAX([sales_measure].[tax_year])))
| | | | | | | |
|--Index Scan(OBJECT:([stars_perftest].[dbo].[sales_measure].[sales_measure_idx1]),
ORDERED FORWARD)
| | | | | | |
|--Index Scan(OBJECT:([stars_perftest].[dbo].[sales_measure].[sales_measure_idx1]),
ORDERED FORWARD)
| | | | | | |--Index
Scan(OBJECT:([stars_perftest].[dbo].[sales_measure].[sales_measure_idx1]),
ORDERED FORWARD)
There are two indexes on sales measure table:
sales_measure_pk - sales_measure_id (primary key) clustered
sales_measure_idx1 - client_id, tax_year, last_update_date, sm_dollars
sales_measure table has 800,000 rows in it.
Here is the rest of the DDL:
IF OBJECT_ID('dbo.client') IS NOT NULL
DROP TABLE dbo.client
GO
create table dbo.client (
client_id int identity primary key
, client_name varchar(100) NOT NULL)
GO
IF OBJECT_ID('dbo.sales_measure') IS NOT NULL
DROP TABLE dbo.sales_measure
GO
create table dbo.sales_measure(
sales_measure_id int identity primary key
, client_id int NOT NULL
, tax_year smallint NOT NULL
, sm_dollars money NOT NULL
, last_update_date datetime NOT NULL)
GO
CREATE INDEX sales_measure_idx1 ON sales_measure (client_id, tax_year,
last_update_date, sm_dollars)
GO
INSERT dbo.client(client_name)
SELECT 'c1' UNION SELECT 'c2' UNION SELECT 'c3'
GO
INSERT dbo.sales_measure(client_id, tax_year, sm_dollars,
last_update_date)
SELECT 1, 2004, 100, '1/4/2004'
UNION
SELECT 2, 2003, 100, '1/3/2004'
UNION
SELECT 2, 2004, 150, '1/4/2004'
UNION
SELECT 2, 2004, 200, '1/5/2004'
The view that I use to calculate sales measures:
CREATE VIEW sales_measure_vw AS
SELECT sm.*
FROM sales_measure sm
INNER JOIN (SELECT sm2.client_id, sm2.tax_year,
MAX(sm2.last_update_date) as last_update_date
FROM sales_measure sm2
INNER JOIN (SELECT sm4.client_id, MAX(sm4.tax_year)
as tax_year
FROM sales_measure sm4 GROUP BY
sm4.client_id) sm3
on sm3.client_id = sm2.client_id
and sm3.tax_year = sm2.tax_year
GROUP BY sm2.client_id, sm2.tax_year ) sm1
ON sm.client_id = sm1.client_id AND
sm.tax_year = sm1.tax_year AND
sm.last_update_date = sm1.last_update_date
Any advice on how to tame this would be appreciated. Also, any advice
on the indexes would help as well.
Thanks