Groups | Blog | Home
all groups > sql server data warehouse > may 2006 >

sql server data warehouse : Newbie Questions About Surrogate Keys and SCD


Gummy
5/19/2006 5:31:42 PM
Hello,

I am new to data warehousing and trying to get my head around two issues:

My understanding is that everything should use a surrogate key, not the
"natural" one that comes from the source system. For example, I have a fact
table of customers and it has a Product Key on it. That Product links to the
Product Dimension table. How do I convert that Product Key to a surrogate
key in both the fact table and the dimension table so they remain linked? Do
I just have a lookup or association table that does the conversion? I assume
that means I would need an association table for every dimension I add to
the database.

Next question:

I am going to have a fact table that uses Slowly Changing Dimensions for
employees. These employees have many actions that can occur - Hired, Fired,
Transfered, Promoted, etc. I will also include, on the fact table the Start
Date, End Date of the action and a flag for the Last Action.
How exactly do I query this fact table to figure out who is currently
active? I have to assume that if your last action was Fired then you are no
longer active. However, I am concerned that in our system you could be
fired, but then another event can take place like you received Cobra money
or some sort of pension action occurred.
In general, I am not clear how you could figure out how to query the data.
An example would be, give me the active headcount at the beginning of the
month, or who terminated between October and December.


Thank you for the help. I think this stuff has some amazing potential (as
soon as I understand it better).







Brian R
6/24/2006 8:39:03 PM
The first bit is easy. You create a single surrogate key in the Product
Dimension. Most people would have this as an identity column. You may or may
not keep the key from the source system (usually you only do this to track
back errors and missing items). The fact table uses this same key for
products, indeed the key will be a primary key in the Product Dimension
table and a foreign key link in the Fact Table. You often provide a
surrogate key for the whole fact table, because combining the other
dimensional keys together simply gives you very long index rows, so a single
integer for each fact table row can be much faster.

--
Brian Reincke
Parity Training
Lonbdon


[quoted text, click to view]

Peter Nolan
7/8/2006 8:24:02 AM
Gummy,
I suggest you take a look at my beginners page on www.peternolan.com.
The Kimbal books are very good for teaching people how to build models.

You specific question relates to a common problem....how do I know the
amount of something at some point in time (number of employees) built
up from the transactions that occurred....and the answer is you
don't....you would typically used a measures fact table or what Ralph
calls a snapshot fact table.

On surrogate keys......on my personal web site you can download the C
source code you need to maintain dimension tables and read it....I
can't give you any better details than the source code required to
build dimensional models.

If you want more information than is on my personal web site feel free
to go to our company web site www.instantbi.com. The user guide for
our software describes in detail how keys are managed in our new
product. The eval version is free......it comes with a small demo
database that shows you how dimension tables are managed and the
documentation describes what happens in detail.....

Best Regards

Peter Nolan
www.peternolan.com



[quoted text, click to view]
AddThis Social Bookmark Button