Groups | Blog | Home
all groups > sql server data warehouse > september 2005 >

sql server data warehouse : Header Vs Transaction Table


marcmc
9/21/2005 5:07:02 AM
Hey All,
Please have a quick read and add your opinions.

I work in the insurance industry.
In our datawarehouse there's a policy header table and a policy transaction
header.
The description of each policy number is in the form 'ABC/XYZ/1234567'

My problem is the header table holds this value as a varchar(15) and the
transaction table varchar(19). This means in about a year and a half by my
business estimates we will break the 'ABC/XYZ/9999999' boundary on the header
table. This is seriously bad news as well as bad initial design by a
reputable 3rd party.

I have also found that development over the years has led to alot of
programs been develeped and this issue was overlooked with regard to the
variables used to massage these values into our warehouse.

I am writing the spec for this change and have taken the approach of check
dependant objects across all databases for the policy number, rebuild
tables/views where necessary, alter table alter column for tables and changes
to the program variables that are too small to bring in the 16 character
number after 'ABC/XYZ/9999999'.

Is my approach correct, are there any tips for a job so little but yet so big?
Peter Nolan
9/21/2005 7:17:42 AM
Marcmc,
this is a very, very standard problem......and no...it is not a small
problem...it points directly to one of the toughest problems in data
warehousing....data integration...

The standard problem more generally stated is:

How do I effectively integrate and maintain relationships between data
that provides real keys that may change or be re-used over time?

For example, telephone numbers get longer, and they are constantly
re-used which is not a problem for the operational system because each
number is unique at a point in time but it is a BIG problem for the EDW
because it must understand that these things change over time....

The very standard answer is...if you are building dimensional DW, all
keys are best as generated meaningless integers.....this means that
somewhere there must be translation tables translating real keys coming
in to meaningless integers and you must have some standardised
mechanism to allocate these keys.....

Where multiple pieces of data can arrive with multiple different
contents but represent the same thing you must have another translation
table converting them to an intermediate meaningless intger key which
is then itself converted to yet another meaningless integer key...of
course, the standard example here is customer keys coming from
different source systems.....since you work in Insurance you will be
well familiar that different policy management systems often allocate
their own customer numbers...

I have published the code to maintain keys in dimension tables in C so
that most people can read it...it's on www.peternolan.com....I have not
published the mechanism of using a cross reference table for things
like customer numbers but it's pretty trivial.....

So, a good question.... ;-)

Peter
www.peternolan.com
marcmc
9/21/2005 9:23:01 AM
Thankyou Peter, may I just add the following:

The policy number is not a key, we have done all the donkey work regarding
keys & meaningless integers.
They are constructed on an AS400 system, we merely dts them in and bring
them through to their final resting place as a description along another
generated key from surrogates.
They are stored seprately and a contructor is used to piece them together
with as part of our run.
We get the breakdown and the full version in our staging tables.

Merging Multiple codes is not an option.
I have done an extensive spec
I am not going to update any data, just increase the size of the 100 or so
fields, rebuild views, mod programs etc. I think this is the safest and maybe
the only option?

My biggest issue at the moment is I have no contingency plan worked out
apart from backing up the tables. By the way this could be impossible due to
space restrictions. Our largest table is almost 18Gb.

Peter Nolan
9/24/2005 2:53:22 PM
Hi Marcmc,

"The description of each policy number is in the form 'ABC/XYZ/1234567'
"

Are you saying you already have an integer that links these
descriptions/policy numbers together? I'm not sure I see your problem
then...if you can uniquely identify a policy over time you will be
ok...if this description gets truncated and the integer key is still
accurate and unique over time you can merely have one description
'survive' the other....

For contingency, partitioing you largest tables on time and backing
them up only when they have changes is usual...but 18GB?? Whats' the
problem in backing that up? Just buy more disk, back it up and compress
the backed up files.....sql server will dump 18GB data in no time at
all...

Peter
AddThis Social Bookmark Button