Groups | Blog | Home
all groups > sql server dts > may 2004 >

sql server dts : Slowly Changing Dimensions - implementing rules in DTS


paulhodgson24 NO[at]SPAM hotmail.com
5/14/2004 6:02:01 AM
Hi all,

I'm currently implementing a Data Warehouse in SQL Server 2000, but
being quite new to the more advanced features of DTS, am struggling to
create packages to support Slowly Changing Dimensions. Three of my
dimensions are hybrid type 1 and type 2 SCDs, so I need to i) check
the incoming source data against the data in the DW to see if the
records already exist - if not, create a new row, and new surrogate
key for that record; ii) if the new record already exists, check to
see if any of the attributes have changed, and if so, either overwrite
the existing data (I've already determined which fields are SCD1 and
SCD2), or create a new record depending on SCD type. I know what I
need to do in principal, but it's just the physical codeing I'm
struggling with, and I'm not sure where to start - Does anyone know of
any code skeletons/templates/resources that could point me in the
right direction, as I'm struggling with the code I've created so far?

I have started some transformation coding for one dimension. So far, I
have a DTS package which takes incoming data from a CSV, and places it
into a staging table for that dimension. I then have a transformation
task between the new source staging table, and the destination
dimension table. The transformation currently only checks for any new
dimension records, and if a new record is present in the incoming
data, a new surrogate key is issued and the new record is inserted
into the dimension table. However, the script currently also tries to
insert a number of completely blank records (initially, it fell over
at this point, since NULL values broke referential integrity, so I'm
allowing all fields in the table to have NULL values for now, while in
the initial stages of development, and for additional debugging),
equivalent to the number of remaining records in the imported data (at
least it's entering the new records, with updated surrogate key, which
is a start). For the script, I'm currently using two Lookups, called
LastKey (retrieves the current highest surrogate key value for the
dimension), and ExistTest (which I'm trying to use to check if the
incoming data already exists in the dimension table), and an ActiveX
script, as below:

Lookup "LastKey" selects the surrogate key value from a lookup table
containing two fields - with the values of the dimension name, and the
current highest surrogate key used in that dimension table:

SELECT Surr_Key
FROM SurrKeys
WHERE (Dimension = ?)

Lookup "ExistTest" (StaffDimLookup is another lookup table in the
database, with two fields recording the Natural Key and the current
Surrogate key applied to it for the dimension):

SELECT Staff_Key
FROM StaffDimLookup
WHERE (Staff_ID = ?)

And the transformation script itself, so far:

Function Main()

Dim HighKey

If IsEmpty(DTSLookups("ExistTest").Execute(DTSSource("Staff_ID")))
Then

HighKey = DTSLookups("LastKey").Execute("StaffDim")
HighKey = HighKey + 1

DTSDestination("Staff_Key") = HighKey 'New Surrogate Key
DTSDestination("Staff_ID") = DTSSource("Staff_ID") 'Natural Key
Other attributes..
....
....
DTSDestination("Curr_Row") = 1 'a flag to indicate the current live
record
DTSDestination("Row_Begin_Date") = Date 'The date the record is active
from, for type 2 SCD changes
DTSDestination("Row_End_Date") = "some value" 'The date the row was
last valid for type 2 SCD changes

End If

Main = DTSTransformStat_OK

End Function

I'm pretty new to using DTS to this extent, so I'm really racking my
brains about how to implement it all. I still need to incorporate
Update sctipts, to check and update rows for type 1 SCD changes, and
to create new records for type 2 changes (for which I've thought about
creating an additional field in the dimension table, for a CRC value,
which could be used to check changes as opposed to checking every
field, although I'm not sure how to create a CRC for a row at the
moment). If I crack it for this dimension, it shouldn't be too much of
an issue to apply the pricipals to the other dimensions.

If anyone has any ideas, I'd be incredibly grateful.

Thanks for any help/information

Paul.

P.S - would it be considered rude to post this request for help on
John Miceli
5/14/2004 11:41:03 AM
Hi Paul

I would offer up Chapter 19 of the Microsoft SQL Server 2000 Resource Kit. I don't know if you have access to this or not, but I have it in Word format if you need it (I think there may be problems with posting it here). If you have access to www.books24x7.com, it can be found there. There are very good examples in this chapter. I would also offer up the following books for information
1. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, Second Edition by Ralph Kimball and Margy Ross ISBN:0471200247,
2. The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses by Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite ISBN:0471255475
3. Microsoft SQL Server 2000 Resource Kit by Microsoft Corporation ISBN:0735612668
4. Professional SQL Server 2000 DTS by Mark Chaffin, Brian Knight and Todd Robinson ISBN:0764543687

Hopefully, this will help

Regards
John Micel
Michael Vardinghus
5/16/2004 10:56:19 AM
Will recommend microsofts sql server accelerator which contains best
practices and as i recall it something about slowly changing dimensions
(free)

http://www.microsoft.com/sql/ssabi/

"Paul" <paulhodgson24@hotmail.com> skrev i en meddelelse
news:afa0c759.0405140502.4181bec7@posting.google.com...
[quoted text, click to view]

AddThis Social Bookmark Button