all groups > sql server (alternate) > november 2005 >
You're in the

sql server (alternate)

group:

Extracting and joining header from denormalized table


Extracting and joining header from denormalized table Thomas R. Hummel
11/30/2005 1:23:39 PM
sql server (alternate):
Hello,

I am currently working on a monthly load process with a datamart. I
originally designed the tables in a normalized fashion with the idea
that I would denormalize as needed once I got an idea of what the
performance was like. There were some performance problems, so the
decision was made to denormalize. Now the users are happy with the
query performance, but loading the tables is much more difficult.
Specifically...

There were two main tables, a header table and a line item table. These
have been combined into one table. For my loads I still receive them as
separate files though. The problem is that I might receive a line item
for a header that began two months ago. When this happens I don't get a
header record in the current month's file - I just get the record in
the line items file. So now I have to join the header and line item
tables in my staging database to get the denormalized rows, but I also
may have to get header information from my main denormalized table
(~150 million rows). For simplicity I will only include the primary
keys and one other column to represent the rest of the row below. The
tables are actually very wide.

Staging database:

CREATE TABLE dbo.Claims (
CLM_ID BIGINT NOT NULL,
CLM_DATA VARCHAR(100) NULL )

CREATE TABLE dbo.Claim_Lines (
CLM_ID BIGINT NOT NULL,
LN_NO SMALLINT NOT NULL,
CLM_LN_DATA VARCHAR(100) NULL )

Target database:

CREATE TABLE dbo.Target (
CLM_ID BIGINT NOT NULL,
LN_NO SMALLINT NOT NULL,
CLM_DATA VARCHAR(100) NULL,
CLM_LN_DATA VARCHAR(100) NULL )

I can either pull back all of the necessary header rows from the target
table to the claims table and then do one insert using a join between
claims and claim lines into the target table OR I can do one insert
with a join between claims and claim lines and then a second insert
with a join between claim lines and target for those lines that weren't
already added.

Some things that I've tried:

INSERT INTO Staging.dbo.Claims (CLM_ID, CLM_DATA)
SELECT DISTINCT T.CLM_ID, T.CLM_DATA
FROM Staging.dbo.Claim_Lines CL
LEFT OUTER JOIN Staging.dbo.Claims C ON C.CLM_ID = CL.CLM_ID
INNER JOIN Target.dbo.Target T ON T.CLM_ID = CL.CLM_ID
WHERE C.CLM_ID IS NULL

INSERT INTO Staging.dbo.Claims (CLM_ID, CLM_DATA)
SELECT T.CLM_ID, T.CLM_DATA
FROM Staging.dbo.Claim_Lines CL
LEFT OUTER JOIN Staging.dbo.Claims C ON C.CLM_ID = CL.CLM_ID
INNER JOIN Target.dbo.Target T ON T.CLM_ID = CL.CLM_ID
WHERE C.CLM_ID IS NULL
GROUP BY T.CLM_ID, T.CLM_DATA

INSERT INTO Staging.dbo.Claims (CLM_ID, CLM_DATA)
SELECT DISTINCT T.CLM_ID, T.CLM_DATA
FROM Target.dbo.Target T
INNER JOIN (SELECT CL.CLM_ID
FROM Staging.dbo.Claim_Lines CL
LEFT OUTER JOIN Staging.dbo.Claims C ON C.CLM_ID =
CL.CLM_ID
WHERE C.CLM_ID IS NULL) SQ ON SQ.CLM_ID = T.CLM_ID

I've also used EXISTS and IN in various queries. No matter which method
I use, the query plans tend to want to do a clustered index scan on the
target table (actually a partitioned view partitioned by year). The
number of headers that were in the target but not the header file this
month was about 42K out of 1M.

So.... any other ideas on how I can set up a query to get the distinct
headers from the denormalized table? Right now I'm considering using
worktables if I can't figure anything else out, but I don't know if
that will really help me much either.

I'm not looking for a definitive answer here, just some ideas that I
can try.

Thanks,
-Tom.
Re: Extracting and joining header from denormalized table Erland Sommarskog
12/1/2005 10:20:12 PM
Thomas R. Hummel (tom_hummel@hotmail.com) writes:
[quoted text, click to view]

Did you try using an index hint?

But wait, you can't force an index on the view, as the view does not
have any indexes. And maybe there lies the problem. What happens if
you instead join directly to the underlying tables?


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Re: Extracting and joining header from denormalized table Thomas R. Hummel
12/2/2005 9:51:42 AM
Thanks for the suggestion Erland. I would have to join to a union of
the tables or perform multiple insert statements using NOT EXISTS (or
equivalent LEFT JOIN logic) to make sure that I only add each row once.
I will try to set this up and run some tests on it, but in the meantime
I found another solution that is adequate. I've generated a table
using:

SELECT CLM_ID, MIN(LN_NO)
FROM Target.dbo.Target
GROUP BY CLM_ID

Using this table I created a view on the full set of UNION'd tables
joined to this so that it only returns one row per claim (the one with
the minimum LN_NO). At least for what I was doing this gave performance
that was acceptable until I have time to further optimize it.

Thanks,
-Tom.
AddThis Social Bookmark Button