all groups > sql server data warehouse > september 2007 >
You're in the

sql server data warehouse

group:

Relational Views as Transformation Method in Data Warehouse


Relational Views as Transformation Method in Data Warehouse DanUp
9/20/2007 12:04:04 PM
sql server data warehouse:
Open Question:
A recent post-reply, one expert (Marco Russo) indicated that he doesn't like
to use relational views as a quasi-ETL method, but rather prefers real
data-moving ETL to de-couple OLTP data from DW Data. I do agree with the
importance of the classic de-coupling approach, but would like to hear more,
from Marco or others, about whether relational views, per se, should always
be avoided as a component to ETL with OLAP downstream.

Specific Scenario:
For a SQL 2005 Data Mart / SSAS BI Application, suppose that, rather than
heavily transforming our large (~500g) operational data (which needs some,
but not tons of transformation) during it's transfer to a Data Mart box,
suppose we bulk-copy it (for speed of job completion), and then use
relational views to transform it (cleanse, filter, perform calcs), in
preparation for relational views' usage in SSAS '05?

Questions:
(1) Does anybody see value in that we now have more flexibility in terms of
change-management during project lifecycle, to manipulate actual table data
without disturbing (if we're careful) the views used for SSAS?
(2) How else might this hurt us, either now or later?
(3) By the way, will Proactive Caching work fully against relational views?

Thanks.

Daniel Upton

Re: Relational Views as Transformation Method in Data Warehouse Joe
9/21/2007 7:39:26 AM
I had posed somewhat similar questions to people working at the Kimball
group. I was basically wanting to speed up the cycle and not go through the
steps of staging the data, building all the data mart schema tables, etc and
just do a connection to a replicated area of our production data. The
response was there was too much great value in having it de-coupled as Marco
has pointed out.

So in developing our next business area need; all the way down to some new
cubes - we followed the typical approach. In the back of mind I have this
burning sensation I still could have done it more quickly and as well with
the direct connection instead.

[quoted text, click to view]
Re: Relational Views as Transformation Method in Data Warehouse Jeje
9/22/2007 12:00:00 AM
if you already load the data, why you don't apply some transformation during
your loading process?
doing this into a view will cause more stress on the database server when
you'll load the cube.
more process you do on the ETL better control you have.
also for maintenance its easier.
you just have to manage an ETL instead of views + ETL. if something is
wrong, where is the problem?

but I agree sometimes its easier to do this in views, I'm doing this for
real simple calculations which not involved large scans of my tables.

and yes, views can be used for the proactive caching. but I think you have
to control the trigger which starts the update yourself. I'm not sure if the
server can detect a table change himself from a view.


[quoted text, click to view]
Re: Relational Views as Transformation Method in Data Warehouse entaroadun
9/25/2007 8:26:35 AM
This is not a functional question, it's an implementation question.
By functional, I mean pertaining to the functional requirement that
your cube simply needs some representation of dimensional and fact
data to serve as a source for its load. That requirement is met,
regardless of whether you use a view, materialized view, or table.

The issue of decoupling is also academic. What's the difference
between the RDBMS updating your cube base tables via view logic and
triggers, and specifying ETL yourself, if the two are functionally
equivalent? They are both coupled; one just allows for greater
customization than the other. This is not a functional question, this
is a management and control question.

When it comes to implementation and management questions, there are no
rules, only guidelines that should lead you to a good architecture to
support your specific situation. How much core DB vs. ETL experience
does your team have? If more DB, then lean towards SPs, T/PL-SQL
scripts, triggers, and views. If more ETL, then do an ETL solution.
Are updates constant enough to kill your DBMS with the overhead of
keeping the view consistent? Then don't use a view. However, if
using views and triggers is a good way to spread out the overhead of
ETL over the day, week, or month, instead of trying to do a big-bang
ETL update at the end of the period, then use views and triggers.
Does using a view for the incremental/full fact load kill cube load
performance? If so, don't use a view. This is very RDBMS dependent
in any case, since different engines instantiate views differently.

I use views because I'm on a team that's heavy on DB skillset. To
conquer the fact load performance issue, we have a well tuned view
with index and join hints to stream fact records to AS. We then swap
it out for a view with the same name that is tuned for index seeks
because we use HOLAP and individual lookups is the access pattern
during cube usage. We don't try to update the cube dynamically every
time the view changes, but when we do kick off cube update (after a
data quality job is run), we are assured that the DBMS will get the
right stuff from the view without having to maintain a separate ETL
solution.

[quoted text, click to view]

Re: Relational Views as Transformation Method in Data Warehouse Marco Russo
9/26/2007 8:14:38 AM
[quoted text, click to view]

I read other replies to this thread and I agree that this is more an
implementation issue than a functional issue.
That said, having consistent guidelines helps a lot if you want to
build a maintainable solution. I don't remember what was my post-reply
(if you can give me a link...) - I read your scenario and your
solution works. You leverages on SQL engine to transformate data. If
you would use SSIS to do data move and/or transformation, then it
would be better having all transformations inside SSIS package.
However, still today I find myself writing JOINs into source query
(that acts on a copy of OLTP data, as in your case) just because
otherwise SSIS would consume too much memory. As we said, it's an
implementation issue.
This topic is very large. In a perfect world, all ETL transformations
are described through consistent meta-data that allows an easy
analysis of field dependencies between source and destination data. In
the real world, I still haven't seen a perfect solution like that :-)

I'm very interesting in this kind of discussions - if there is someone
interested, we could open a separate discussion on that (eventually
moving out from the general newsgroup).

Marco Russo
http://www.sqlbi.eu
http://sqlblog.com/blogs/marco_russo
AddThis Social Bookmark Button