all groups > sql server dts > may 2005 >
You're in the

sql server dts

group:

Architecture


Architecture Michael Vardinghus
5/14/2005 12:00:00 AM
sql server dts:
Getting two tables from one base with 4 mio and 11 mio records.

The 11 mio records are dimension specification on the 4 mio records so I
need to put these 11 mio records alongsid the 4 mio in my fact table.

-- Example table 1
create view trans1 as
select 100 as beløb, 1 as linje
union all
select 200 as beløb, 2 as linje

-- Example table 2
create view trans2 as
select 1 as linje, 'a' as dim1, 'dim1' as type
union all
select 1 as linje, 'b' as dim2, 'dim2' as type
union all
select 2 as linje, 'a' as dim1, 'dim1' as typ
union all
select 2 as linje, 'a' as dim2, 'dim2' as type

-- Example of combining
select a.*, (select dim1 from trans2 b where
b.linje = a.linje and b.type = 'dim1') as dim1,
(select dim1 from trans2 c where c.linje = a.linje and
c.type = 'dim2') as dim2
from
trans1 a

My question is how to go about this with the best performance - here's one
shot:

1) Pulling the 2 tables into my relational warehouse base
2) An index combines the 2 tables and this index fills up my fact table

My concern is what to do about indexing ? Should I have indexes when filling
up from base 1 ? Or should the be removed when doing so ? Do I need to index
the two tables and the view ?

Perhaps I don't need a view - perhaps a could just use the sentence above
and let index optimizer give me a hint to what to make indexes for ?

Thanx in advance

/Michael

Re: Architecture Allan Mitchell
5/15/2005 12:00:00 AM
Michael,

Let me see if I have the question correct. You have 2 source tables(11
million dimension table rows and 4 million fact table rows?). If that
is the case then what dimension has 11 million rows?

What is the source? Some sources need handling differently
Are the two tables in a database on a different server?
Is this a one off feed?

OR

Do you want to merge the two tables together on the destination from
your two sources?

That might become more complicated if your keys clash. Whilst you will
be generating new primary key values (Surrogate Key) you will need a
point of reference when loading the fact table so will need the old key
to refer to.

You could use partitioned views so you would split this 15 million row
table based on an attribute say Date. The view unions the partitions
back together again and the index optimizer is smart enough to know from
where the rows that you want will come.

mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\createdb.chm::/cm_8_des_06_17zr.htm




The "combining" idea presented below will hurt you as not only do you
have a lot of rows to play with anyway but you do 2 extra queries for
every row as well.

The index optimizer will use whatever indexes it chooses unless you
override her with an index hint.

Let me know if I have grabbed the wrong meaning from the post and we can
revisit my comments.

Allan


[quoted text, click to view]
Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
You hid it right on the knotch

Yeah - it is strange isn't it ? Table will contatin a tremendous amount of
records ....

See my example in another post


[quoted text, click to view]

Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
Don't know if you can read the other message but

Table 1 has an entry no, table id which links it to table 2
In table 2 the dimension code values are listed per entry no - with a
corresponding dimension type

And as you say I need to flatten this...

So the dimension type values in table 2 need to be columns in a new table
and in these columns the corresponding dimension code values.

[quoted text, click to view]

Re: Architecture Allan Mitchell
5/15/2005 12:00:00 AM
So as Far as I can see what I might do is insert the fact rows into a
dummy fact table as is so NULLs for the dimensionIDs.
You pull all the dimension rows into a working table.
You can now update the dummy fact table with the Dimension IDs by
joining onto the fact table with the ID and the posting date.

You would make multiple passes through the data based on DimensionID in
the dimensions table.

The dimensions table would be indexed so I could retrieve easily and
efficiently the correct rows.

If you do not want to do the processing on the remote server then
bringing all the rows over is certainly an option.

Why are things held like this? Seems strange and inefficient.


Allan



[quoted text, click to view]
Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
thanx again Allan

So this is it

Table 1 on a new base equals table 1 on transaction base
Table 2 on a new base equals table 2 on trasaction base
Table 3 on a new base is a dummy table and this is used like this (in this
table I set 0 instead of null as default for dimension values)
insert records from table 1
update dimension type a from table 2
update dimension type a from table 2

And with indexes on table 1 and 2 ?
Should these indexes be removed when filling up from transaction base and
applied when sending from 1/2 to 3 in new base ?
I'm not that clever at indexing - only used the wizard so far - how would
indexing look like - just index what I'm using in the where clauses ?


[quoted text, click to view]
Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
I think the reason for this design must be this.

Its an erp application and we're pulling out of the general ledgar - in this
module you could have several dimensions (you could use 3 or 10) - and these
dimensions can be named differently from one client to the next - so instead
of having named columns in the normal transaction table they also decided
to make to dimension types a record in the system - instead of a column

I have seen this solved in another way in another system, though.

Just confirms the fact that ERP applications aren't the best at reporting
with such kind of datastructure.

[quoted text, click to view]
message
Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
What does it mean if i bring over all the rows every night and empty the
table each time - just bring them over to put them into a new table

Will my base explode ? Up till now I havent' made logic for just getting new
records and I'm not kean on doing that - I like to be sure that everything
is transferred every time but
if this means an enourmous database I would reconsider.

I'm not worried about the log files - I already truncate these every time -
so they'll be set to a minimum after each transfer.

perhaps the tempdb will explode if the server isn't shut down ? or does the
server automatically reduce this after using it ?

[quoted text, click to view]
message
Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
Its not dimension tables - it's a fact table with the transactions in table
1 and in table 2 there is the dimension specifications for each transaction.
The reason for the larger number in table 2 is that
there is used multiple dimensions and for each dimension a new line is made
but it is referring back to table 1 with only 1 field.

So what I would like to do was to take table a which for instance has 7
columns and then for each dimension TYPE in table 2 I want to add a new
column and in these columns I would like
to write the corresponding dimension value from table 2.

The tables are placed on a different server - and I would like to avoid to
do something on this server - would prefer to do it on my "washing"
machine...

About the surrogate keys that is a different matter - I will be doing that
later - after these two tables are merged. My problem is that I don't have
any dimension values in the first table yet - I need to put these on and
then afterwards replacing these with surrogate keys.

Perhaps I could do something with partitions on table 2 based on dimension
type ? Would that make the sentence below better ?

[quoted text, click to view]
mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm
[quoted text, click to view]

Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
Real data example

Table 1

Entry no Posting date Description
Amount

1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c
-8895277.00000000000000000000
2 1999-10-06 00:00:00.000 500100496 Ulandsekr.
8895277.00000000000000000000
3 1999-10-06 00:00:00.000 1. rate
-1610400.00000000000000000000


Table 2

Table ID Entry no Dimension Code Dimension value code
Amount Posting date

17 1 DELREGNSKAB
-8895277.00000000000000000000 1999-10-06 00:00:00.000
17 2 DELREGNSKAB 700
8895277.00000000000000000000 1999-10-06 00:00:00.000
17 2 STED 6091
8895277.00000000000000000000 1999-10-06 00:00:00.000
17 3 DELREGNSKAB
-1610400.00000000000000000000 1999-10-06 00:00:00.000

I will alwas only need table id 17 end Entry no joins them ...

So this is what i want

New table

Entry no Posting date Description
Amount Delregnskab
Sted

1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c
-8895277.00000000000000000000 700
null
2 1999-10-06 00:00:00.000 500100496 Ulandsekr.
8895277.00000000000000000000 700
6091
3 1999-10-06 00:00:00.000 1. rate
-1610400.00000000000000000000 700
null

If possible i would like to eliminate nulls at once but that could be washed
afterwards

Besides delregnskab and sted there is 3-4 other dimension that should be
handled the same way.

[quoted text, click to view]
mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre
atedb.chm::/cm_8_des_06_17zr.htm
[quoted text, click to view]

Re: Architecture Michael Vardinghus
5/15/2005 12:00:00 AM
That looked okay when typing it ... but not anymore...
[quoted text, click to view]

Re: Architecture Allan Mitchell
5/15/2005 12:00:00 AM
Can you post the definitions of the two tables and show how they relate.
Can you also post the definition of the fact table and how this
relates to those tables.

From what I can understand then is that there is a fact table file (4
million )and a dimensions table file (11 million)

The dimensions table file is large because you have multiple rows in
there for each fact table row telling you the dimension key value for
that fact table row.

You must then in the dimensions table file identify the dimension itself
and the value for that dimension

Strange design but without the structure I would have to guess at say
you have a dimensions table file that looks like this

FactRowID, DimensionType, DimensionValue

And your fact table file might look like

FactRowID, val1, Val2....

Your end result is required to be

FactRowID,DimensionName1, DimensionName2.........


You basically need to flatten the dimensions table file.


Am I any nearer?


[quoted text, click to view]
Re: Architecture Allan Mitchell
5/15/2005 12:00:00 AM
Bringing over all rows every night regardless of if they have changed or
not is expensive and not needed. You should find a way to identify rows
in the source that are new or have changed. This way you only need to
bring over what is required.




[quoted text, click to view]
Re: Architecture Allan Mitchell
5/15/2005 12:00:00 AM
I take it then you are looking to refill the Fact Table every night.
Are the dimension tables in place or are you looking to use the one
table as all your dimension tables?







[quoted text, click to view]
Re: Architecture Allan Mitchell
5/15/2005 12:00:00 AM
If you can take the hit then sure reload the fact but I personally would
much rather only take rows that were changed or new.
It will take less time and be less intense.

Allan

[quoted text, click to view]
Re: Architecture Kim Vardinghus-Nielsen
5/15/2005 12:00:00 AM
The dimension tables are in place ... these are placed in other tables

Yep refill the fact - that isn't so bad as transporting everything from
host - or ?


"Allan Mitchell" <allan@no-spam.sqldts.com> skrev i en meddelelse
news:eQQ6VBVWFHA.612@TK2MSFTNGP12.phx.gbl...
[quoted text, click to view]