all groups > sql server data warehouse > october 2003 >
You're in the

sql server data warehouse

group:

Datawharehouse loading methods. Whats best?


Datawharehouse loading methods. Whats best? Michael Polombo
10/27/2003 11:58:00 AM
sql server data warehouse: In the event a you need to view data from an OLTP database with no(or very
little) latency and the aggregations from the normalized structure make it
impractical to do this without loading into a datawarehouse what is the best
load solution?

i.e. replication, job that runs every minute.

I was thinking of applying a timestamp to pertinent tables to do
incrememntal updates to specific altered records as necessary. At this rate
there would be very little, in terms of volume, for the job to process.
However the job contains several cursors which worries me. Any suggestions
would be greatly appreciated. Would replication be a better solution?

Re: Datawharehouse loading methods. Whats best? Robert Harmon
10/31/2003 8:13:37 AM
With a few exceptions, I would generally not suggest using replication to
move data from a OLTP source to a DSS destination. It may look like a good
strategy now, as I'm assuming you are dealing with a single data source,
however, as the DSS environment grows in complexity and datasets are brought
in from other sources, it very quickly becomes impossible to integrate data
via replication and the whole thing gets re-written using batch jobs anyway.
Another huge advantage that batch loading allows, is the ability to add a
batch id to your destination records. If you have a batch id (i.e.. you
know all the records that came in this load) you can back a batch out if
something goes wrong with the load.

In my experience, timestamps are a godsend! Assuming you have the authority
to do so, be very generous with time stamps as it makes things like primary
key surrogation and load selectivity very easy.

Starting in the OLTP world myself, I can understand your nervousness with
cursors. I still shy away from them to this day, however, there are many
things that happen in the Transform stage of loading data which truly
require cursors. Don't get me wrong, don't use them if they aren't truly
needed as unnecessary cursors will definitely slow your loads. Remember,
this isn't a transactional system, you're normally not in anybody's way and
deadlocks aren't a very likely as the only client updating data is the
server.

Hope this helps, and let me know if I wasn't vague enough!

Rob


[quoted text, click to view]

AddThis Social Bookmark Button