sql server dts:
Getting two tables from one base with 4 mio and 11 mio records. The 11 mio records are dimension specification on the 4 mio records so I need to put these 11 mio records alongsid the 4 mio in my fact table. -- Example table 1 create view trans1 as select 100 as beløb, 1 as linje union all select 200 as beløb, 2 as linje -- Example table 2 create view trans2 as select 1 as linje, 'a' as dim1, 'dim1' as type union all select 1 as linje, 'b' as dim2, 'dim2' as type union all select 2 as linje, 'a' as dim1, 'dim1' as typ union all select 2 as linje, 'a' as dim2, 'dim2' as type -- Example of combining select a.*, (select dim1 from trans2 b where b.linje = a.linje and b.type = 'dim1') as dim1, (select dim1 from trans2 c where c.linje = a.linje and c.type = 'dim2') as dim2 from trans1 a My question is how to go about this with the best performance - here's one shot: 1) Pulling the 2 tables into my relational warehouse base 2) An index combines the 2 tables and this index fills up my fact table My concern is what to do about indexing ? Should I have indexes when filling up from base 1 ? Or should the be removed when doing so ? Do I need to index the two tables and the view ? Perhaps I don't need a view - perhaps a could just use the sentence above and let index optimizer give me a hint to what to make indexes for ? Thanx in advance /Michael
Michael, Let me see if I have the question correct. You have 2 source tables(11 million dimension table rows and 4 million fact table rows?). If that is the case then what dimension has 11 million rows? What is the source? Some sources need handling differently Are the two tables in a database on a different server? Is this a one off feed? OR Do you want to merge the two tables together on the destination from your two sources? That might become more complicated if your keys clash. Whilst you will be generating new primary key values (Surrogate Key) you will need a point of reference when loading the fact table so will need the old key to refer to. You could use partitioned views so you would split this 15 million row table based on an attribute say Date. The view unions the partitions back together again and the index optimizer is smart enough to know from where the rows that you want will come. mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\createdb.chm::/cm_8_des_06_17zr.htm The "combining" idea presented below will hurt you as not only do you have a lot of rows to play with anyway but you do 2 extra queries for every row as well. The index optimizer will use whatever indexes it chooses unless you override her with an index hint. Let me know if I have grabbed the wrong meaning from the post and we can revisit my comments. Allan [quoted text, click to view] "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message news:michaelvardinghus@hotmail.com: > Getting two tables from one base with 4 mio and 11 mio records. > > The 11 mio records are dimension specification on the 4 mio records so I > need to put these 11 mio records alongsid the 4 mio in my fact table. > > -- Example table 1 > create view trans1 as > select 100 as beløb, 1 as linje > union all > select 200 as beløb, 2 as linje > > -- Example table 2 > create view trans2 as > select 1 as linje, 'a' as dim1, 'dim1' as type > union all > select 1 as linje, 'b' as dim2, 'dim2' as type > union all > select 2 as linje, 'a' as dim1, 'dim1' as typ > union all > select 2 as linje, 'a' as dim2, 'dim2' as type > > -- Example of combining > select a.*, (select dim1 from trans2 b where > b.linje = a.linje and b.type = 'dim1') as dim1, > (select dim1 from trans2 c where c.linje = a.linje and > c.type = 'dim2') as dim2 > from > trans1 a > > My question is how to go about this with the best performance - here's one > shot: > > 1) Pulling the 2 tables into my relational warehouse base > 2) An index combines the 2 tables and this index fills up my fact table > > My concern is what to do about indexing ? Should I have indexes when filling > up from base 1 ? Or should the be removed when doing so ? Do I need to index > the two tables and the view ? > > Perhaps I don't need a view - perhaps a could just use the sentence above > and let index optimizer give me a hint to what to make indexes for ? > > Thanx in advance > > /Michael
You hid it right on the knotch Yeah - it is strange isn't it ? Table will contatin a tremendous amount of records .... See my example in another post [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > Can you post the definitions of the two tables and show how they relate. > Can you also post the definition of the fact table and how this > relates to those tables. > > From what I can understand then is that there is a fact table file (4 > million )and a dimensions table file (11 million) > > The dimensions table file is large because you have multiple rows in > there for each fact table row telling you the dimension key value for > that fact table row. > > You must then in the dimensions table file identify the dimension itself > and the value for that dimension > > Strange design but without the structure I would have to guess at say > you have a dimensions table file that looks like this > > FactRowID, DimensionType, DimensionValue > > And your fact table file might look like > > FactRowID, val1, Val2.... > > Your end result is required to be > > FactRowID,DimensionName1, DimensionName2......... > > > You basically need to flatten the dimensions table file. > > > Am I any nearer? > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Its not dimension tables - it's a fact table with the transactions in table > > 1 and in table 2 there is the dimension specifications for each transaction. > > The reason for the larger number in table 2 is that > > there is used multiple dimensions and for each dimension a new line is made > > but it is referring back to table 1 with only 1 field. > > > > So what I would like to do was to take table a which for instance has 7 > > columns and then for each dimension TYPE in table 2 I want to add a new > > column and in these columns I would like > > to write the corresponding dimension value from table 2. > > > > The tables are placed on a different server - and I would like to avoid to > > do something on this server - would prefer to do it on my "washing" > > machine... > > > > About the surrogate keys that is a different matter - I will be doing that > > later - after these two tables are merged. My problem is that I don't have > > any dimension values in the first table yet - I need to put these on and > > then afterwards replacing these with surrogate keys. > > > > Perhaps I could do something with partitions on table 2 based on dimension > > type ? Would that make the sentence below better ? > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > Michael, > > > > > > Let me see if I have the question correct. You have 2 source tables(11 > > > million dimension table rows and 4 million fact table rows?). If that > > > is the case then what dimension has 11 million rows? > > > > > > What is the source? Some sources need handling differently > > > Are the two tables in a database on a different server? > > > Is this a one off feed? > > > > > > OR > > > > > > Do you want to merge the two tables together on the destination from > > > your two sources? > > > > > > That might become more complicated if your keys clash. Whilst you will > > > be generating new primary key values (Surrogate Key) you will need a > > > point of reference when loading the fact table so will need the old key > > > to refer to. > > > > > > You could use partitioned views so you would split this 15 million row > > > table based on an attribute say Date. The view unions the partitions > > > back together again and the index optimizer is smart enough to know from > > > where the rows that you want will come. > > > > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > > have a lot of rows to play with anyway but you do 2 extra queries for > > > every row as well. > > > > > > The index optimizer will use whatever indexes it chooses unless you > > > override her with an index hint. > > > > > > Let me know if I have grabbed the wrong meaning from the post and we can > > > revisit my comments. > > > > > > Allan > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > news:michaelvardinghus@hotmail.com: > > > > > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > > > > > The 11 mio records are dimension specification on the 4 mio records so I > > > > need to put these 11 mio records alongsid the 4 mio in my fact table. > > > > > > > > -- Example table 1 > > > > create view trans1 as > > > > select 100 as beløb, 1 as linje > > > > union all > > > > select 200 as beløb, 2 as linje > > > > > > > > -- Example table 2 > > > > create view trans2 as > > > > select 1 as linje, 'a' as dim1, 'dim1' as type > > > > union all > > > > select 1 as linje, 'b' as dim2, 'dim2' as type > > > > union all > > > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > > > union all > > > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > > > > > -- Example of combining > > > > select a.*, (select dim1 from trans2 b where > > > > b.linje = a.linje and b.type = 'dim1') as dim1, > > > > (select dim1 from trans2 c where c.linje = a.linje and > > > > c.type = 'dim2') as dim2 > > > > from > > > > trans1 a > > > > > > > > My question is how to go about this with the best performance - here's > > one > > > > shot: > > > > > > > > 1) Pulling the 2 tables into my relational warehouse base > > > > 2) An index combines the 2 tables and this index fills up my fact table > > > > > > > > My concern is what to do about indexing ? Should I have indexes when > > filling > > > > up from base 1 ? Or should the be removed when doing so ? Do I need to > > index > > > > the two tables and the view ? > > > > > > > > Perhaps I don't need a view - perhaps a could just use the sentence > > above > > > > and let index optimizer give me a hint to what to make indexes for ? > > > > > > > > Thanx in advance > > > > > > > > /Michael > > > >
Don't know if you can read the other message but Table 1 has an entry no, table id which links it to table 2 In table 2 the dimension code values are listed per entry no - with a corresponding dimension type And as you say I need to flatten this... So the dimension type values in table 2 need to be columns in a new table and in these columns the corresponding dimension code values. [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > Can you post the definitions of the two tables and show how they relate. > Can you also post the definition of the fact table and how this > relates to those tables. > > From what I can understand then is that there is a fact table file (4 > million )and a dimensions table file (11 million) > > The dimensions table file is large because you have multiple rows in > there for each fact table row telling you the dimension key value for > that fact table row. > > You must then in the dimensions table file identify the dimension itself > and the value for that dimension > > Strange design but without the structure I would have to guess at say > you have a dimensions table file that looks like this > > FactRowID, DimensionType, DimensionValue > > And your fact table file might look like > > FactRowID, val1, Val2.... > > Your end result is required to be > > FactRowID,DimensionName1, DimensionName2......... > > > You basically need to flatten the dimensions table file. > > > Am I any nearer? > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Its not dimension tables - it's a fact table with the transactions in table > > 1 and in table 2 there is the dimension specifications for each transaction. > > The reason for the larger number in table 2 is that > > there is used multiple dimensions and for each dimension a new line is made > > but it is referring back to table 1 with only 1 field. > > > > So what I would like to do was to take table a which for instance has 7 > > columns and then for each dimension TYPE in table 2 I want to add a new > > column and in these columns I would like > > to write the corresponding dimension value from table 2. > > > > The tables are placed on a different server - and I would like to avoid to > > do something on this server - would prefer to do it on my "washing" > > machine... > > > > About the surrogate keys that is a different matter - I will be doing that > > later - after these two tables are merged. My problem is that I don't have > > any dimension values in the first table yet - I need to put these on and > > then afterwards replacing these with surrogate keys. > > > > Perhaps I could do something with partitions on table 2 based on dimension > > type ? Would that make the sentence below better ? > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > Michael, > > > > > > Let me see if I have the question correct. You have 2 source tables(11 > > > million dimension table rows and 4 million fact table rows?). If that > > > is the case then what dimension has 11 million rows? > > > > > > What is the source? Some sources need handling differently > > > Are the two tables in a database on a different server? > > > Is this a one off feed? > > > > > > OR > > > > > > Do you want to merge the two tables together on the destination from > > > your two sources? > > > > > > That might become more complicated if your keys clash. Whilst you will > > > be generating new primary key values (Surrogate Key) you will need a > > > point of reference when loading the fact table so will need the old key > > > to refer to. > > > > > > You could use partitioned views so you would split this 15 million row > > > table based on an attribute say Date. The view unions the partitions > > > back together again and the index optimizer is smart enough to know from > > > where the rows that you want will come. > > > > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > > have a lot of rows to play with anyway but you do 2 extra queries for > > > every row as well. > > > > > > The index optimizer will use whatever indexes it chooses unless you > > > override her with an index hint. > > > > > > Let me know if I have grabbed the wrong meaning from the post and we can > > > revisit my comments. > > > > > > Allan > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > news:michaelvardinghus@hotmail.com: > > > > > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > > > > > The 11 mio records are dimension specification on the 4 mio records so I > > > > need to put these 11 mio records alongsid the 4 mio in my fact table. > > > > > > > > -- Example table 1 > > > > create view trans1 as > > > > select 100 as beløb, 1 as linje > > > > union all > > > > select 200 as beløb, 2 as linje > > > > > > > > -- Example table 2 > > > > create view trans2 as > > > > select 1 as linje, 'a' as dim1, 'dim1' as type > > > > union all > > > > select 1 as linje, 'b' as dim2, 'dim2' as type > > > > union all > > > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > > > union all > > > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > > > > > -- Example of combining > > > > select a.*, (select dim1 from trans2 b where > > > > b.linje = a.linje and b.type = 'dim1') as dim1, > > > > (select dim1 from trans2 c where c.linje = a.linje and > > > > c.type = 'dim2') as dim2 > > > > from > > > > trans1 a > > > > > > > > My question is how to go about this with the best performance - here's > > one > > > > shot: > > > > > > > > 1) Pulling the 2 tables into my relational warehouse base > > > > 2) An index combines the 2 tables and this index fills up my fact table > > > > > > > > My concern is what to do about indexing ? Should I have indexes when > > filling > > > > up from base 1 ? Or should the be removed when doing so ? Do I need to > > index > > > > the two tables and the view ? > > > > > > > > Perhaps I don't need a view - perhaps a could just use the sentence > > above > > > > and let index optimizer give me a hint to what to make indexes for ? > > > > > > > > Thanx in advance > > > > > > > > /Michael > > > >
So as Far as I can see what I might do is insert the fact rows into a dummy fact table as is so NULLs for the dimensionIDs. You pull all the dimension rows into a working table. You can now update the dummy fact table with the Dimension IDs by joining onto the fact table with the ID and the posting date. You would make multiple passes through the data based on DimensionID in the dimensions table. The dimensions table would be indexed so I could retrieve easily and efficiently the correct rows. If you do not want to do the processing on the remote server then bringing all the rows over is certainly an option. Why are things held like this? Seems strange and inefficient. Allan [quoted text, click to view] "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message news:michaelvardinghus@hotmail.com: > Don't know if you can read the other message but > > Table 1 has an entry no, table id which links it to table 2 > In table 2 the dimension code values are listed per entry no - with a > corresponding dimension type > > And as you say I need to flatten this... > > So the dimension type values in table 2 need to be columns in a new table > and in these columns the corresponding dimension code values. > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > > Can you post the definitions of the two tables and show how they relate. > > Can you also post the definition of the fact table and how this > > relates to those tables. > > > > From what I can understand then is that there is a fact table file (4 > > million )and a dimensions table file (11 million) > > > > The dimensions table file is large because you have multiple rows in > > there for each fact table row telling you the dimension key value for > > that fact table row. > > > > You must then in the dimensions table file identify the dimension itself > > and the value for that dimension > > > > Strange design but without the structure I would have to guess at say > > you have a dimensions table file that looks like this > > > > FactRowID, DimensionType, DimensionValue > > > > And your fact table file might look like > > > > FactRowID, val1, Val2.... > > > > Your end result is required to be > > > > FactRowID,DimensionName1, DimensionName2......... > > > > > > You basically need to flatten the dimensions table file. > > > > > > Am I any nearer? > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > news:michaelvardinghus@hotmail.com: > > > > > Its not dimension tables - it's a fact table with the transactions in > table > > > 1 and in table 2 there is the dimension specifications for each > transaction. > > > The reason for the larger number in table 2 is that > > > there is used multiple dimensions and for each dimension a new line is > made > > > but it is referring back to table 1 with only 1 field. > > > > > > So what I would like to do was to take table a which for instance has 7 > > > columns and then for each dimension TYPE in table 2 I want to add a new > > > column and in these columns I would like > > > to write the corresponding dimension value from table 2. > > > > > > The tables are placed on a different server - and I would like to avoid > to > > > do something on this server - would prefer to do it on my "washing" > > > machine... > > > > > > About the surrogate keys that is a different matter - I will be doing > that > > > later - after these two tables are merged. My problem is that I don't > have > > > any dimension values in the first table yet - I need to put these on and > > > then afterwards replacing these with surrogate keys. > > > > > > Perhaps I could do something with partitions on table 2 based on > dimension > > > type ? Would that make the sentence below better ? > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > > Michael, > > > > > > > > Let me see if I have the question correct. You have 2 source > tables(11 > > > > million dimension table rows and 4 million fact table rows?). If that > > > > is the case then what dimension has 11 million rows? > > > > > > > > What is the source? Some sources need handling differently > > > > Are the two tables in a database on a different server? > > > > Is this a one off feed? > > > > > > > > OR > > > > > > > > Do you want to merge the two tables together on the destination from > > > > your two sources? > > > > > > > > That might become more complicated if your keys clash. Whilst you > will > > > > be generating new primary key values (Surrogate Key) you will need a > > > > point of reference when loading the fact table so will need the old > key > > > > to refer to. > > > > > > > > You could use partitioned views so you would split this 15 million row > > > > table based on an attribute say Date. The view unions the partitions > > > > back together again and the index optimizer is smart enough to know > from > > > > where the rows that you want will come. > > > > > > > > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > > > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > > > have a lot of rows to play with anyway but you do 2 extra queries for > > > > every row as well. > > > > > > > > The index optimizer will use whatever indexes it chooses unless you > > > > override her with an index hint. > > > > > > > > Let me know if I have grabbed the wrong meaning from the post and we > can > > > > revisit my comments. > > > > > > > > Allan > > > > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > > news:michaelvardinghus@hotmail.com: > > > > > > > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > > > > > > > The 11 mio records are dimension specification on the 4 mio records > so I > > > > > need to put these 11 mio records alongsid the 4 mio in my fact > table. > > > > > > > > > > -- Example table 1 > > > > > create view trans1 as > > > > > select 100 as beløb, 1 as linje > > > > > union all > > > > > select 200 as beløb, 2 as linje > > > > > > > > > > -- Example table 2 > > > > > create view trans2 as > > > > > select 1 as linje, 'a' as dim1, 'dim1' as type > > > > > union all > > > > > select 1 as linje, 'b' as dim2, 'dim2' as type > > > > > union all > > > > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > > > > union all > > > > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > > > > > > > -- Example of combining > > > > > select a.*, (select dim1 from trans2 b where > > > > > b.linje = a.linje and b.type = 'dim1') as dim1, > > > > > (select dim1 from trans2 c where c.linje = a.linje and
thanx again Allan So this is it Table 1 on a new base equals table 1 on transaction base Table 2 on a new base equals table 2 on trasaction base Table 3 on a new base is a dummy table and this is used like this (in this table I set 0 instead of null as default for dimension values) insert records from table 1 update dimension type a from table 2 update dimension type a from table 2 And with indexes on table 1 and 2 ? Should these indexes be removed when filling up from transaction base and applied when sending from 1/2 to 3 in new base ? I'm not that clever at indexing - only used the wizard so far - how would indexing look like - just index what I'm using in the where clauses ? [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... > So as Far as I can see what I might do is insert the fact rows into a > dummy fact table as is so NULLs for the dimensionIDs. > You pull all the dimension rows into a working table. > You can now update the dummy fact table with the Dimension IDs by > joining onto the fact table with the ID and the posting date. > > You would make multiple passes through the data based on DimensionID in > the dimensions table. > > The dimensions table would be indexed so I could retrieve easily and > efficiently the correct rows. > > If you do not want to do the processing on the remote server then > bringing all the rows over is certainly an option. > > Why are things held like this? Seems strange and inefficient. > > > Allan > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Don't know if you can read the other message but > > > > Table 1 has an entry no, table id which links it to table 2 > > In table 2 the dimension code values are listed per entry no - with a > > corresponding dimension type > > > > And as you say I need to flatten this... > > > > So the dimension type values in table 2 need to be columns in a new table > > and in these columns the corresponding dimension code values. > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > > > Can you post the definitions of the two tables and show how they relate. > > > Can you also post the definition of the fact table and how this > > > relates to those tables. > > > > > > From what I can understand then is that there is a fact table file (4 > > > million )and a dimensions table file (11 million) > > > > > > The dimensions table file is large because you have multiple rows in > > > there for each fact table row telling you the dimension key value for > > > that fact table row. > > > > > > You must then in the dimensions table file identify the dimension itself > > > and the value for that dimension > > > > > > Strange design but without the structure I would have to guess at say > > > you have a dimensions table file that looks like this > > > > > > FactRowID, DimensionType, DimensionValue > > > > > > And your fact table file might look like > > > > > > FactRowID, val1, Val2.... > > > > > > Your end result is required to be > > > > > > FactRowID,DimensionName1, DimensionName2......... > > > > > > > > > You basically need to flatten the dimensions table file. > > > > > > > > > Am I any nearer? > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > news:michaelvardinghus@hotmail.com: > > > > > > > Its not dimension tables - it's a fact table with the transactions in > > table > > > > 1 and in table 2 there is the dimension specifications for each > > transaction. > > > > The reason for the larger number in table 2 is that > > > > there is used multiple dimensions and for each dimension a new line is > > made > > > > but it is referring back to table 1 with only 1 field. > > > > > > > > So what I would like to do was to take table a which for instance has 7 > > > > columns and then for each dimension TYPE in table 2 I want to add a new > > > > column and in these columns I would like > > > > to write the corresponding dimension value from table 2. > > > > > > > > The tables are placed on a different server - and I would like to avoid > > to > > > > do something on this server - would prefer to do it on my "washing" > > > > machine... > > > > > > > > About the surrogate keys that is a different matter - I will be doing > > that > > > > later - after these two tables are merged. My problem is that I don't > > have > > > > any dimension values in the first table yet - I need to put these on and > > > > then afterwards replacing these with surrogate keys. > > > > > > > > Perhaps I could do something with partitions on table 2 based on > > dimension > > > > type ? Would that make the sentence below better ? > > > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > > > Michael, > > > > > > > > > > Let me see if I have the question correct. You have 2 source > > tables(11 > > > > > million dimension table rows and 4 million fact table rows?). If that > > > > > is the case then what dimension has 11 million rows? > > > > > > > > > > What is the source? Some sources need handling differently > > > > > Are the two tables in a database on a different server? > > > > > Is this a one off feed? > > > > > > > > > > OR > > > > > > > > > > Do you want to merge the two tables together on the destination from > > > > > your two sources? > > > > > > > > > > That might become more complicated if your keys clash. Whilst you > > will > > > > > be generating new primary key values (Surrogate Key) you will need a > > > > > point of reference when loading the fact table so will need the old > > key > > > > > to refer to. > > > > > > > > > > You could use partitioned views so you would split this 15 million row > > > > > table based on an attribute say Date. The view unions the partitions > > > > > back together again and the index optimizer is smart enough to know > > from > > > > > where the rows that you want will come. > > > > > > > > > > > > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > > > > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > > > > > > > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > > > > have a lot of rows to play with anyway but you do 2 extra queries for > > > > > every row as well. > > > > > > > > > > The index optimizer will use whatever indexes it chooses unless you > > > > > override her with an index hint. > > > > > > > > > > Let me know if I have grabbed the wrong meaning from the post and we > > can > > > > > revisit my comments. > > > > > > > > > > Allan > > > > > > > > > >
I think the reason for this design must be this. Its an erp application and we're pulling out of the general ledgar - in this module you could have several dimensions (you could use 3 or 10) - and these dimensions can be named differently from one client to the next - so instead of having named columns in the normal transaction table they also decided to make to dimension types a record in the system - instead of a column I have seen this solved in another way in another system, though. Just confirms the fact that ERP applications aren't the best at reporting with such kind of datastructure. [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... > So as Far as I can see what I might do is insert the fact rows into a > dummy fact table as is so NULLs for the dimensionIDs. > You pull all the dimension rows into a working table. > You can now update the dummy fact table with the Dimension IDs by > joining onto the fact table with the ID and the posting date. > > You would make multiple passes through the data based on DimensionID in > the dimensions table. > > The dimensions table would be indexed so I could retrieve easily and > efficiently the correct rows. > > If you do not want to do the processing on the remote server then > bringing all the rows over is certainly an option. > > Why are things held like this? Seems strange and inefficient. > > > Allan > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Don't know if you can read the other message but > > > > Table 1 has an entry no, table id which links it to table 2 > > In table 2 the dimension code values are listed per entry no - with a > > corresponding dimension type > > > > And as you say I need to flatten this... > > > > So the dimension type values in table 2 need to be columns in a new table > > and in these columns the corresponding dimension code values. > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > > > Can you post the definitions of the two tables and show how they relate. > > > Can you also post the definition of the fact table and how this > > > relates to those tables. > > > > > > From what I can understand then is that there is a fact table file (4 > > > million )and a dimensions table file (11 million) > > > > > > The dimensions table file is large because you have multiple rows in > > > there for each fact table row telling you the dimension key value for > > > that fact table row. > > > > > > You must then in the dimensions table file identify the dimension itself > > > and the value for that dimension > > > > > > Strange design but without the structure I would have to guess at say > > > you have a dimensions table file that looks like this > > > > > > FactRowID, DimensionType, DimensionValue > > > > > > And your fact table file might look like > > > > > > FactRowID, val1, Val2.... > > > > > > Your end result is required to be > > > > > > FactRowID,DimensionName1, DimensionName2......... > > > > > > > > > You basically need to flatten the dimensions table file. > > > > > > > > > Am I any nearer? > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > news:michaelvardinghus@hotmail.com: > > > > > > > Its not dimension tables - it's a fact table with the transactions in > > table > > > > 1 and in table 2 there is the dimension specifications for each > > transaction. > > > > The reason for the larger number in table 2 is that > > > > there is used multiple dimensions and for each dimension a new line is > > made > > > > but it is referring back to table 1 with only 1 field. > > > > > > > > So what I would like to do was to take table a which for instance has 7 > > > > columns and then for each dimension TYPE in table 2 I want to add a new > > > > column and in these columns I would like > > > > to write the corresponding dimension value from table 2. > > > > > > > > The tables are placed on a different server - and I would like to avoid > > to > > > > do something on this server - would prefer to do it on my "washing" > > > > machine... > > > > > > > > About the surrogate keys that is a different matter - I will be doing > > that > > > > later - after these two tables are merged. My problem is that I don't > > have > > > > any dimension values in the first table yet - I need to put these on and > > > > then afterwards replacing these with surrogate keys. > > > > > > > > Perhaps I could do something with partitions on table 2 based on > > dimension > > > > type ? Would that make the sentence below better ? > > > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > > > Michael, > > > > > > > > > > Let me see if I have the question correct. You have 2 source > > tables(11 > > > > > million dimension table rows and 4 million fact table rows?). If that > > > > > is the case then what dimension has 11 million rows? > > > > > > > > > > What is the source? Some sources need handling differently > > > > > Are the two tables in a database on a different server? > > > > > Is this a one off feed? > > > > > > > > > > OR > > > > > > > > > > Do you want to merge the two tables together on the destination from > > > > > your two sources? > > > > > > > > > > That might become more complicated if your keys clash. Whilst you > > will > > > > > be generating new primary key values (Surrogate Key) you will need a > > > > > point of reference when loading the fact table so will need the old > > key > > > > > to refer to. > > > > > > > > > > You could use partitioned views so you would split this 15 million row > > > > > table based on an attribute say Date. The view unions the partitions > > > > > back together again and the index optimizer is smart enough to know > > from > > > > > where the rows that you want will come. > > > > > > > > > > > > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > > > > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > > > > > > > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > > > > have a lot of rows to play with anyway but you do 2 extra queries for > > > > > every row as well. > > > > > > > > > > The index optimizer will use whatever indexes it chooses unless you > > > > > override her with an index hint. > > > > > > > > > > Let me know if I have grabbed the wrong meaning from the post and we > > can > > > > > revisit my comments. > > > > > > > > > > Allan > > > > > > > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in
message
What does it mean if i bring over all the rows every night and empty the table each time - just bring them over to put them into a new table Will my base explode ? Up till now I havent' made logic for just getting new records and I'm not kean on doing that - I like to be sure that everything is transferred every time but if this means an enourmous database I would reconsider. I'm not worried about the log files - I already truncate these every time - so they'll be set to a minimum after each transfer. perhaps the tempdb will explode if the server isn't shut down ? or does the server automatically reduce this after using it ? [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... > So as Far as I can see what I might do is insert the fact rows into a > dummy fact table as is so NULLs for the dimensionIDs. > You pull all the dimension rows into a working table. > You can now update the dummy fact table with the Dimension IDs by > joining onto the fact table with the ID and the posting date. > > You would make multiple passes through the data based on DimensionID in > the dimensions table. > > The dimensions table would be indexed so I could retrieve easily and > efficiently the correct rows. > > If you do not want to do the processing on the remote server then > bringing all the rows over is certainly an option. > > Why are things held like this? Seems strange and inefficient. > > > Allan > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Don't know if you can read the other message but > > > > Table 1 has an entry no, table id which links it to table 2 > > In table 2 the dimension code values are listed per entry no - with a > > corresponding dimension type > > > > And as you say I need to flatten this... > > > > So the dimension type values in table 2 need to be columns in a new table > > and in these columns the corresponding dimension code values. > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > > > Can you post the definitions of the two tables and show how they relate. > > > Can you also post the definition of the fact table and how this > > > relates to those tables. > > > > > > From what I can understand then is that there is a fact table file (4 > > > million )and a dimensions table file (11 million) > > > > > > The dimensions table file is large because you have multiple rows in > > > there for each fact table row telling you the dimension key value for > > > that fact table row. > > > > > > You must then in the dimensions table file identify the dimension itself > > > and the value for that dimension > > > > > > Strange design but without the structure I would have to guess at say > > > you have a dimensions table file that looks like this > > > > > > FactRowID, DimensionType, DimensionValue > > > > > > And your fact table file might look like > > > > > > FactRowID, val1, Val2.... > > > > > > Your end result is required to be > > > > > > FactRowID,DimensionName1, DimensionName2......... > > > > > > > > > You basically need to flatten the dimensions table file. > > > > > > > > > Am I any nearer? > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > news:michaelvardinghus@hotmail.com: > > > > > > > Its not dimension tables - it's a fact table with the transactions in > > table > > > > 1 and in table 2 there is the dimension specifications for each > > transaction. > > > > The reason for the larger number in table 2 is that > > > > there is used multiple dimensions and for each dimension a new line is > > made > > > > but it is referring back to table 1 with only 1 field. > > > > > > > > So what I would like to do was to take table a which for instance has 7 > > > > columns and then for each dimension TYPE in table 2 I want to add a new > > > > column and in these columns I would like > > > > to write the corresponding dimension value from table 2. > > > > > > > > The tables are placed on a different server - and I would like to avoid > > to > > > > do something on this server - would prefer to do it on my "washing" > > > > machine... > > > > > > > > About the surrogate keys that is a different matter - I will be doing > > that > > > > later - after these two tables are merged. My problem is that I don't > > have > > > > any dimension values in the first table yet - I need to put these on and > > > > then afterwards replacing these with surrogate keys. > > > > > > > > Perhaps I could do something with partitions on table 2 based on > > dimension > > > > type ? Would that make the sentence below better ? > > > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > > > Michael, > > > > > > > > > > Let me see if I have the question correct. You have 2 source > > tables(11 > > > > > million dimension table rows and 4 million fact table rows?). If that > > > > > is the case then what dimension has 11 million rows? > > > > > > > > > > What is the source? Some sources need handling differently > > > > > Are the two tables in a database on a different server? > > > > > Is this a one off feed? > > > > > > > > > > OR > > > > > > > > > > Do you want to merge the two tables together on the destination from > > > > > your two sources? > > > > > > > > > > That might become more complicated if your keys clash. Whilst you > > will > > > > > be generating new primary key values (Surrogate Key) you will need a > > > > > point of reference when loading the fact table so will need the old > > key > > > > > to refer to. > > > > > > > > > > You could use partitioned views so you would split this 15 million row > > > > > table based on an attribute say Date. The view unions the partitions > > > > > back together again and the index optimizer is smart enough to know > > from > > > > > where the rows that you want will come. > > > > > > > > > > > > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > > > > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > > > > > > > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > > > > have a lot of rows to play with anyway but you do 2 extra queries for > > > > > every row as well. > > > > > > > > > > The index optimizer will use whatever indexes it chooses unless you > > > > > override her with an index hint. > > > > > > > > > > Let me know if I have grabbed the wrong meaning from the post and we > > can > > > > > revisit my comments. > > > > > > > > > > Allan > > > > > > > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in
message
Its not dimension tables - it's a fact table with the transactions in table 1 and in table 2 there is the dimension specifications for each transaction. The reason for the larger number in table 2 is that there is used multiple dimensions and for each dimension a new line is made but it is referring back to table 1 with only 1 field. So what I would like to do was to take table a which for instance has 7 columns and then for each dimension TYPE in table 2 I want to add a new column and in these columns I would like to write the corresponding dimension value from table 2. The tables are placed on a different server - and I would like to avoid to do something on this server - would prefer to do it on my "washing" machine... About the surrogate keys that is a different matter - I will be doing that later - after these two tables are merged. My problem is that I don't have any dimension values in the first table yet - I need to put these on and then afterwards replacing these with surrogate keys. Perhaps I could do something with partitions on table 2 based on dimension type ? Would that make the sentence below better ? [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > Michael, > > Let me see if I have the question correct. You have 2 source tables(11 > million dimension table rows and 4 million fact table rows?). If that > is the case then what dimension has 11 million rows? > > What is the source? Some sources need handling differently > Are the two tables in a database on a different server? > Is this a one off feed? > > OR > > Do you want to merge the two tables together on the destination from > your two sources? > > That might become more complicated if your keys clash. Whilst you will > be generating new primary key values (Surrogate Key) you will need a > point of reference when loading the fact table so will need the old key > to refer to. > > You could use partitioned views so you would split this 15 million row > table based on an attribute say Date. The view unions the partitions > back together again and the index optimizer is smart enough to know from > where the rows that you want will come. > >
mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm [quoted text, click to view] > > > > > The "combining" idea presented below will hurt you as not only do you > have a lot of rows to play with anyway but you do 2 extra queries for > every row as well. > > The index optimizer will use whatever indexes it chooses unless you > override her with an index hint. > > Let me know if I have grabbed the wrong meaning from the post and we can > revisit my comments. > > Allan > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > The 11 mio records are dimension specification on the 4 mio records so I > > need to put these 11 mio records alongsid the 4 mio in my fact table. > > > > -- Example table 1 > > create view trans1 as > > select 100 as beløb, 1 as linje > > union all > > select 200 as beløb, 2 as linje > > > > -- Example table 2 > > create view trans2 as > > select 1 as linje, 'a' as dim1, 'dim1' as type > > union all > > select 1 as linje, 'b' as dim2, 'dim2' as type > > union all > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > union all > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > -- Example of combining > > select a.*, (select dim1 from trans2 b where > > b.linje = a.linje and b.type = 'dim1') as dim1, > > (select dim1 from trans2 c where c.linje = a.linje and > > c.type = 'dim2') as dim2 > > from > > trans1 a > > > > My question is how to go about this with the best performance - here's one > > shot: > > > > 1) Pulling the 2 tables into my relational warehouse base > > 2) An index combines the 2 tables and this index fills up my fact table > > > > My concern is what to do about indexing ? Should I have indexes when filling > > up from base 1 ? Or should the be removed when doing so ? Do I need to index > > the two tables and the view ? > > > > Perhaps I don't need a view - perhaps a could just use the sentence above > > and let index optimizer give me a hint to what to make indexes for ? > > > > Thanx in advance > > > > /Michael >
Real data example Table 1 Entry no Posting date Description Amount 1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c -8895277.00000000000000000000 2 1999-10-06 00:00:00.000 500100496 Ulandsekr. 8895277.00000000000000000000 3 1999-10-06 00:00:00.000 1. rate -1610400.00000000000000000000 Table 2 Table ID Entry no Dimension Code Dimension value code Amount Posting date 17 1 DELREGNSKAB -8895277.00000000000000000000 1999-10-06 00:00:00.000 17 2 DELREGNSKAB 700 8895277.00000000000000000000 1999-10-06 00:00:00.000 17 2 STED 6091 8895277.00000000000000000000 1999-10-06 00:00:00.000 17 3 DELREGNSKAB -1610400.00000000000000000000 1999-10-06 00:00:00.000 I will alwas only need table id 17 end Entry no joins them ... So this is what i want New table Entry no Posting date Description Amount Delregnskab Sted 1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c -8895277.00000000000000000000 700 null 2 1999-10-06 00:00:00.000 500100496 Ulandsekr. 8895277.00000000000000000000 700 6091 3 1999-10-06 00:00:00.000 1. rate -1610400.00000000000000000000 700 null If possible i would like to eliminate nulls at once but that could be washed afterwards Besides delregnskab and sted there is 3-4 other dimension that should be handled the same way. [quoted text, click to view] "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > Michael, > > Let me see if I have the question correct. You have 2 source tables(11 > million dimension table rows and 4 million fact table rows?). If that > is the case then what dimension has 11 million rows? > > What is the source? Some sources need handling differently > Are the two tables in a database on a different server? > Is this a one off feed? > > OR > > Do you want to merge the two tables together on the destination from > your two sources? > > That might become more complicated if your keys clash. Whilst you will > be generating new primary key values (Surrogate Key) you will need a > point of reference when loading the fact table so will need the old key > to refer to. > > You could use partitioned views so you would split this 15 million row > table based on an attribute say Date. The view unions the partitions > back together again and the index optimizer is smart enough to know from > where the rows that you want will come. > >
mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre atedb.chm::/cm_8_des_06_17zr.htm [quoted text, click to view] > > > > > The "combining" idea presented below will hurt you as not only do you > have a lot of rows to play with anyway but you do 2 extra queries for > every row as well. > > The index optimizer will use whatever indexes it chooses unless you > override her with an index hint. > > Let me know if I have grabbed the wrong meaning from the post and we can > revisit my comments. > > Allan > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > The 11 mio records are dimension specification on the 4 mio records so I > > need to put these 11 mio records alongsid the 4 mio in my fact table. > > > > -- Example table 1 > > create view trans1 as > > select 100 as beløb, 1 as linje > > union all > > select 200 as beløb, 2 as linje > > > > -- Example table 2 > > create view trans2 as > > select 1 as linje, 'a' as dim1, 'dim1' as type > > union all > > select 1 as linje, 'b' as dim2, 'dim2' as type > > union all > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > union all > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > -- Example of combining > > select a.*, (select dim1 from trans2 b where > > b.linje = a.linje and b.type = 'dim1') as dim1, > > (select dim1 from trans2 c where c.linje = a.linje and > > c.type = 'dim2') as dim2 > > from > > trans1 a > > > > My question is how to go about this with the best performance - here's one > > shot: > > > > 1) Pulling the 2 tables into my relational warehouse base > > 2) An index combines the 2 tables and this index fills up my fact table > > > > My concern is what to do about indexing ? Should I have indexes when filling > > up from base 1 ? Or should the be removed when doing so ? Do I need to index > > the two tables and the view ? > > > > Perhaps I don't need a view - perhaps a could just use the sentence above > > and let index optimizer give me a hint to what to make indexes for ? > > > > Thanx in advance > > > > /Michael >
That looked okay when typing it ... but not anymore... [quoted text, click to view] "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message news:OERyj0SWFHA.1224@TK2MSFTNGP10.phx.gbl... > Real data example > > Table 1 > > Entry no Posting date Description > Amount > > 1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c > -8895277.00000000000000000000 > 2 1999-10-06 00:00:00.000 500100496 Ulandsekr. > 8895277.00000000000000000000 > 3 1999-10-06 00:00:00.000 1. rate > -1610400.00000000000000000000 > > > Table 2 > > Table ID Entry no Dimension Code Dimension value code > Amount Posting date > > 17 1 DELREGNSKAB > -8895277.00000000000000000000 1999-10-06 00:00:00.000 > 17 2 DELREGNSKAB 700 > 8895277.00000000000000000000 1999-10-06 00:00:00.000 > 17 2 STED 6091 > 8895277.00000000000000000000 1999-10-06 00:00:00.000 > 17 3 DELREGNSKAB > -1610400.00000000000000000000 1999-10-06 00:00:00.000 > > I will alwas only need table id 17 end Entry no joins them ... > > So this is what i want > > New table > > Entry no Posting date Description > Amount Delregnskab > Sted > > 1 1999-10-06 00:00:00.000 Rammeaft. udb. a/c > -8895277.00000000000000000000 700 > null > 2 1999-10-06 00:00:00.000 500100496 Ulandsekr. > 8895277.00000000000000000000 700 > 6091 > 3 1999-10-06 00:00:00.000 1. rate > -1610400.00000000000000000000 700 > null > > If possible i would like to eliminate nulls at once but that could be washed > afterwards > > Besides delregnskab and sted there is 3-4 other dimension that should be > handled the same way. > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > Michael, > > > > Let me see if I have the question correct. You have 2 source tables(11 > > million dimension table rows and 4 million fact table rows?). If that > > is the case then what dimension has 11 million rows? > > > > What is the source? Some sources need handling differently > > Are the two tables in a database on a different server? > > Is this a one off feed? > > > > OR > > > > Do you want to merge the two tables together on the destination from > > your two sources? > > > > That might become more complicated if your keys clash. Whilst you will > > be generating new primary key values (Surrogate Key) you will need a > > point of reference when loading the fact table so will need the old key > > to refer to. > > > > You could use partitioned views so you would split this 15 million row > > table based on an attribute say Date. The view unions the partitions > > back together again and the index optimizer is smart enough to know from > > where the rows that you want will come. > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > have a lot of rows to play with anyway but you do 2 extra queries for > > every row as well. > > > > The index optimizer will use whatever indexes it chooses unless you > > override her with an index hint. > > > > Let me know if I have grabbed the wrong meaning from the post and we can > > revisit my comments. > > > > Allan > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > news:michaelvardinghus@hotmail.com: > > > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > > > The 11 mio records are dimension specification on the 4 mio records so I > > > need to put these 11 mio records alongsid the 4 mio in my fact table. > > > > > > -- Example table 1 > > > create view trans1 as > > > select 100 as beløb, 1 as linje > > > union all > > > select 200 as beløb, 2 as linje > > > > > > -- Example table 2 > > > create view trans2 as > > > select 1 as linje, 'a' as dim1, 'dim1' as type > > > union all > > > select 1 as linje, 'b' as dim2, 'dim2' as type > > > union all > > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > > union all > > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > > > -- Example of combining > > > select a.*, (select dim1 from trans2 b where > > > b.linje = a.linje and b.type = 'dim1') as dim1, > > > (select dim1 from trans2 c where c.linje = a.linje and > > > c.type = 'dim2') as dim2 > > > from > > > trans1 a > > > > > > My question is how to go about this with the best performance - here's > one > > > shot: > > > > > > 1) Pulling the 2 tables into my relational warehouse base > > > 2) An index combines the 2 tables and this index fills up my fact table > > > > > > My concern is what to do about indexing ? Should I have indexes when > filling > > > up from base 1 ? Or should the be removed when doing so ? Do I need to > index > > > the two tables and the view ? > > > > > > Perhaps I don't need a view - perhaps a could just use the sentence > above > > > and let index optimizer give me a hint to what to make indexes for ? > > > > > > Thanx in advance > > > > > > /Michael > > > >
Can you post the definitions of the two tables and show how they relate. Can you also post the definition of the fact table and how this relates to those tables. From what I can understand then is that there is a fact table file (4 million )and a dimensions table file (11 million) The dimensions table file is large because you have multiple rows in there for each fact table row telling you the dimension key value for that fact table row. You must then in the dimensions table file identify the dimension itself and the value for that dimension Strange design but without the structure I would have to guess at say you have a dimensions table file that looks like this FactRowID, DimensionType, DimensionValue And your fact table file might look like FactRowID, val1, Val2.... Your end result is required to be FactRowID,DimensionName1, DimensionName2......... You basically need to flatten the dimensions table file. Am I any nearer? [quoted text, click to view] "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message news:michaelvardinghus@hotmail.com: > Its not dimension tables - it's a fact table with the transactions in table > 1 and in table 2 there is the dimension specifications for each transaction. > The reason for the larger number in table 2 is that > there is used multiple dimensions and for each dimension a new line is made > but it is referring back to table 1 with only 1 field. > > So what I would like to do was to take table a which for instance has 7 > columns and then for each dimension TYPE in table 2 I want to add a new > column and in these columns I would like > to write the corresponding dimension value from table 2. > > The tables are placed on a different server - and I would like to avoid to > do something on this server - would prefer to do it on my "washing" > machine... > > About the surrogate keys that is a different matter - I will be doing that > later - after these two tables are merged. My problem is that I don't have > any dimension values in the first table yet - I need to put these on and > then afterwards replacing these with surrogate keys. > > Perhaps I could do something with partitions on table 2 based on dimension > type ? Would that make the sentence below better ? > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > Michael, > > > > Let me see if I have the question correct. You have 2 source tables(11 > > million dimension table rows and 4 million fact table rows?). If that > > is the case then what dimension has 11 million rows? > > > > What is the source? Some sources need handling differently > > Are the two tables in a database on a different server? > > Is this a one off feed? > > > > OR > > > > Do you want to merge the two tables together on the destination from > > your two sources? > > > > That might become more complicated if your keys clash. Whilst you will > > be generating new primary key values (Surrogate Key) you will need a > > point of reference when loading the fact table so will need the old key > > to refer to. > > > > You could use partitioned views so you would split this 15 million row > > table based on an attribute say Date. The view unions the partitions > > back together again and the index optimizer is smart enough to know from > > where the rows that you want will come. > > > > > mk:@MSITStore:C:\Program%20Files\Microsoft%20SQL%20Server\80\Tools\Books\cre > atedb.chm::/cm_8_des_06_17zr.htm > > > > > > > > > > The "combining" idea presented below will hurt you as not only do you > > have a lot of rows to play with anyway but you do 2 extra queries for > > every row as well. > > > > The index optimizer will use whatever indexes it chooses unless you > > override her with an index hint. > > > > Let me know if I have grabbed the wrong meaning from the post and we can > > revisit my comments. > > > > Allan > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > news:michaelvardinghus@hotmail.com: > > > > > Getting two tables from one base with 4 mio and 11 mio records. > > > > > > The 11 mio records are dimension specification on the 4 mio records so I > > > need to put these 11 mio records alongsid the 4 mio in my fact table. > > > > > > -- Example table 1 > > > create view trans1 as > > > select 100 as beløb, 1 as linje > > > union all > > > select 200 as beløb, 2 as linje > > > > > > -- Example table 2 > > > create view trans2 as > > > select 1 as linje, 'a' as dim1, 'dim1' as type > > > union all > > > select 1 as linje, 'b' as dim2, 'dim2' as type > > > union all > > > select 2 as linje, 'a' as dim1, 'dim1' as typ > > > union all > > > select 2 as linje, 'a' as dim2, 'dim2' as type > > > > > > -- Example of combining > > > select a.*, (select dim1 from trans2 b where > > > b.linje = a.linje and b.type = 'dim1') as dim1, > > > (select dim1 from trans2 c where c.linje = a.linje and > > > c.type = 'dim2') as dim2 > > > from > > > trans1 a > > > > > > My question is how to go about this with the best performance - here's > one > > > shot: > > > > > > 1) Pulling the 2 tables into my relational warehouse base > > > 2) An index combines the 2 tables and this index fills up my fact table > > > > > > My concern is what to do about indexing ? Should I have indexes when > filling > > > up from base 1 ? Or should the be removed when doing so ? Do I need to > index > > > the two tables and the view ? > > > > > > Perhaps I don't need a view - perhaps a could just use the sentence > above > > > and let index optimizer give me a hint to what to make indexes for ? > > > > > > Thanx in advance > > > > > > /Michael > >
Bringing over all rows every night regardless of if they have changed or not is expensive and not needed. You should find a way to identify rows in the source that are new or have changed. This way you only need to bring over what is required. [quoted text, click to view] "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message news:michaelvardinghus@hotmail.com: > What does it mean if i bring over all the rows every night and empty the > table each time - just bring them over to put them into a new table > > Will my base explode ? Up till now I havent' made logic for just getting new > records and I'm not kean on doing that - I like to be sure that everything > is transferred every time but > if this means an enourmous database I would reconsider. > > I'm not worried about the log files - I already truncate these every time - > so they'll be set to a minimum after each transfer. > > perhaps the tempdb will explode if the server isn't shut down ? or does the > server automatically reduce this after using it ? > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... > > So as Far as I can see what I might do is insert the fact rows into a > > dummy fact table as is so NULLs for the dimensionIDs. > > You pull all the dimension rows into a working table. > > You can now update the dummy fact table with the Dimension IDs by > > joining onto the fact table with the ID and the posting date. > > > > You would make multiple passes through the data based on DimensionID in > > the dimensions table. > > > > The dimensions table would be indexed so I could retrieve easily and > > efficiently the correct rows. > > > > If you do not want to do the processing on the remote server then > > bringing all the rows over is certainly an option. > > > > Why are things held like this? Seems strange and inefficient. > > > > > > Allan > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > news:michaelvardinghus@hotmail.com: > > > > > Don't know if you can read the other message but > > > > > > Table 1 has an entry no, table id which links it to table 2 > > > In table 2 the dimension code values are listed per entry no - with a > > > corresponding dimension type > > > > > > And as you say I need to flatten this... > > > > > > So the dimension type values in table 2 need to be columns in a new > table > > > and in these columns the corresponding dimension code values. > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > > > > Can you post the definitions of the two tables and show how they > relate. > > > > Can you also post the definition of the fact table and how this > > > > relates to those tables. > > > > > > > > From what I can understand then is that there is a fact table file (4 > > > > million )and a dimensions table file (11 million) > > > > > > > > The dimensions table file is large because you have multiple rows in > > > > there for each fact table row telling you the dimension key value for > > > > that fact table row. > > > > > > > > You must then in the dimensions table file identify the dimension > itself > > > > and the value for that dimension > > > > > > > > Strange design but without the structure I would have to guess at say > > > > you have a dimensions table file that looks like this > > > > > > > > FactRowID, DimensionType, DimensionValue > > > > > > > > And your fact table file might look like > > > > > > > > FactRowID, val1, Val2.... > > > > > > > > Your end result is required to be > > > > > > > > FactRowID,DimensionName1, DimensionName2......... > > > > > > > > > > > > You basically need to flatten the dimensions table file. > > > > > > > > > > > > Am I any nearer? > > > > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > > news:michaelvardinghus@hotmail.com: > > > > > > > > > Its not dimension tables - it's a fact table with the transactions > in > > > table > > > > > 1 and in table 2 there is the dimension specifications for each > > > transaction. > > > > > The reason for the larger number in table 2 is that > > > > > there is used multiple dimensions and for each dimension a new line > is > > > made > > > > > but it is referring back to table 1 with only 1 field. > > > > > > > > > > So what I would like to do was to take table a which for instance > has 7 > > > > > columns and then for each dimension TYPE in table 2 I want to add a > new > > > > > column and in these columns I would like > > > > > to write the corresponding dimension value from table 2. > > > > > > > > > > The tables are placed on a different server - and I would like to > avoid > > > to > > > > > do something on this server - would prefer to do it on my "washing" > > > > > machine... > > > > > > > > > > About the surrogate keys that is a different matter - I will be > doing > > > that > > > > > later - after these two tables are merged. My problem is that I > don't > > > have > > > > > any dimension values in the first table yet - I need to put these on > and > > > > > then afterwards replacing these with surrogate keys. > > > > > > > > > > Perhaps I could do something with partitions on table 2 based on > > > dimension > > > > > type ? Would that make the sentence below better ? > > > > > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > > > > Michael, > > > > > > > > > > > > Let me see if I have the question correct. You have 2 source > > > tables(11 > > > > > > million dimension table rows and 4 million fact table rows?). If > that > > > > > > is the case then what dimension has 11 million rows? > > > > > > > > > > > > What is the source? Some sources need handling differently > > > > > > Are the two tables in a database on a different server? > > > > > > Is this a one off feed? > > > > > > > > > > > > OR > > > > > > > > > > > > Do you want to merge the two tables together on the destination > from > > > > > > your two sources? > > > > > > > > > > > > That might become more complicated if your keys clash. Whilst you > > > will > > > > > > be generating new primary key values (Surrogate Key) you will need > a > > > > > > point of reference when loading the fact table so will need the > old > > > key > > > > > > to refer to. > > > > > > > > > > > > You could use partitioned views so you would split this 15 million > row > > > > > > table based on an attribute say Date. The view unions the > partitions > > > > > > back together again and the index optimizer is smart enough to > know > > > from > > > > > > where the rows that you want will come. > > > > > > > > > > > > > > > > > > > >
I take it then you are looking to refill the Fact Table every night. Are the dimension tables in place or are you looking to use the one table as all your dimension tables? [quoted text, click to view] "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message news:michaelvardinghus@hotmail.com: > thanx again Allan > > So this is it > > Table 1 on a new base equals table 1 on transaction base > Table 2 on a new base equals table 2 on trasaction base > Table 3 on a new base is a dummy table and this is used like this (in this > table I set 0 instead of null as default for dimension values) > insert records from table 1 > update dimension type a from table 2 > update dimension type a from table 2 > > And with indexes on table 1 and 2 ? > Should these indexes be removed when filling up from transaction base and > applied when sending from 1/2 to 3 in new base ? > I'm not that clever at indexing - only used the wizard so far - how would > indexing look like - just index what I'm using in the where clauses ? > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... > > So as Far as I can see what I might do is insert the fact rows into a > > dummy fact table as is so NULLs for the dimensionIDs. > > You pull all the dimension rows into a working table. > > You can now update the dummy fact table with the Dimension IDs by > > joining onto the fact table with the ID and the posting date. > > > > You would make multiple passes through the data based on DimensionID in > > the dimensions table. > > > > The dimensions table would be indexed so I could retrieve easily and > > efficiently the correct rows. > > > > If you do not want to do the processing on the remote server then > > bringing all the rows over is certainly an option. > > > > Why are things held like this? Seems strange and inefficient. > > > > > > Allan > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > news:michaelvardinghus@hotmail.com: > > > > > Don't know if you can read the other message but > > > > > > Table 1 has an entry no, table id which links it to table 2 > > > In table 2 the dimension code values are listed per entry no - with a > > > corresponding dimension type > > > > > > And as you say I need to flatten this... > > > > > > So the dimension type values in table 2 need to be columns in a new > table > > > and in these columns the corresponding dimension code values. > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > > > > Can you post the definitions of the two tables and show how they > relate. > > > > Can you also post the definition of the fact table and how this > > > > relates to those tables. > > > > > > > > From what I can understand then is that there is a fact table file (4 > > > > million )and a dimensions table file (11 million) > > > > > > > > The dimensions table file is large because you have multiple rows in > > > > there for each fact table row telling you the dimension key value for > > > > that fact table row. > > > > > > > > You must then in the dimensions table file identify the dimension > itself > > > > and the value for that dimension > > > > > > > > Strange design but without the structure I would have to guess at say > > > > you have a dimensions table file that looks like this > > > > > > > > FactRowID, DimensionType, DimensionValue > > > > > > > > And your fact table file might look like > > > > > > > > FactRowID, val1, Val2.... > > > > > > > > Your end result is required to be > > > > > > > > FactRowID,DimensionName1, DimensionName2......... > > > > > > > > > > > > You basically need to flatten the dimensions table file. > > > > > > > > > > > > Am I any nearer? > > > > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > > > news:michaelvardinghus@hotmail.com: > > > > > > > > > Its not dimension tables - it's a fact table with the transactions > in > > > table > > > > > 1 and in table 2 there is the dimension specifications for each > > > transaction. > > > > > The reason for the larger number in table 2 is that > > > > > there is used multiple dimensions and for each dimension a new line > is > > > made > > > > > but it is referring back to table 1 with only 1 field. > > > > > > > > > > So what I would like to do was to take table a which for instance > has 7 > > > > > columns and then for each dimension TYPE in table 2 I want to add a > new > > > > > column and in these columns I would like > > > > > to write the corresponding dimension value from table 2. > > > > > > > > > > The tables are placed on a different server - and I would like to > avoid > > > to > > > > > do something on this server - would prefer to do it on my "washing" > > > > > machine... > > > > > > > > > > About the surrogate keys that is a different matter - I will be > doing > > > that > > > > > later - after these two tables are merged. My problem is that I > don't > > > have > > > > > any dimension values in the first table yet - I need to put these on > and > > > > > then afterwards replacing these with surrogate keys. > > > > > > > > > > Perhaps I could do something with partitions on table 2 based on > > > dimension > > > > > type ? Would that make the sentence below better ? > > > > > > > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > > > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > > > > > > Michael, > > > > > > > > > > > > Let me see if I have the question correct. You have 2 source > > > tables(11 > > > > > > million dimension table rows and 4 million fact table rows?). If > that > > > > > > is the case then what dimension has 11 million rows? > > > > > > > > > > > > What is the source? Some sources need handling differently > > > > > > Are the two tables in a database on a different server? > > > > > > Is this a one off feed? > > > > > > > > > > > > OR > > > > > > > > > > > > Do you want to merge the two tables together on the destination > from > > > > > > your two sources? > > > > > > > > > > > > That might become more complicated if your keys clash. Whilst you > > > will > > > > > > be generating new primary key values (Surrogate Key) you will need > a > > > > > > point of reference when loading the fact table so will need the > old > > > key > > > > > > to refer to. > > > > > > > > > > > > You could use partitioned views so you would split this 15 million > row > > > > > > table based on an attribute say Date. The view unions the > partitions > > > > > > back together again and the index optimizer is smart enough to > know > > > from > > > > > > where the rows that you want will come. > > > > > > > > > > > > > > > > > > > >
If you can take the hit then sure reload the fact but I personally would much rather only take rows that were changed or new. It will take less time and be less intense. Allan [quoted text, click to view] "Kim Vardinghus-Nielsen" <johndoe@test.com> wrote in message news:johndoe@test.com: > The dimension tables are in place ... these are placed in other tables > > Yep refill the fact - that isn't so bad as transporting everything from > host - or ? > > > "Allan Mitchell" <allan@no-spam.sqldts.com> skrev i en meddelelse > news:eQQ6VBVWFHA.612@TK2MSFTNGP12.phx.gbl... > >I take it then you are looking to refill the Fact Table every night. > > Are the dimension tables in place or are you looking to use the one table > > as all your dimension tables? > > > > > > > > > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > > news:michaelvardinghus@hotmail.com: > > > >> thanx again Allan > >> > >> So this is it > >> > >> Table 1 on a new base equals table 1 on transaction base > >> Table 2 on a new base equals table 2 on trasaction base > >> Table 3 on a new base is a dummy table and this is used like this (in > >> this > >> table I set 0 instead of null as default for dimension values) > >> insert records from table 1 > >> update dimension type a from table 2 > >> update dimension type a from table 2 > >> > >> And with indexes on table 1 and 2 ? > >> Should these indexes be removed when filling up from transaction base and > >> applied when sending from 1/2 to 3 in new base ? > >> I'm not that clever at indexing - only used the wizard so far - how would > >> indexing look like - just index what I'm using in the where clauses ? > >> > >> > >> "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > >> news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... > >> > So as Far as I can see what I might do is insert the fact rows into a > >> > dummy fact table as is so NULLs for the dimensionIDs. > >> > You pull all the dimension rows into a working table. > >> > You can now update the dummy fact table with the Dimension IDs by > >> > joining onto the fact table with the ID and the posting date. > >> > > >> > You would make multiple passes through the data based on DimensionID in > >> > the dimensions table. > >> > > >> > The dimensions table would be indexed so I could retrieve easily and > >> > efficiently the correct rows. > >> > > >> > If you do not want to do the processing on the remote server then > >> > bringing all the rows over is certainly an option. > >> > > >> > Why are things held like this? Seems strange and inefficient. > >> > > >> > > >> > Allan > >> > > >> > > >> > > >> > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > >> > news:michaelvardinghus@hotmail.com: > >> > > >> > > Don't know if you can read the other message but > >> > > > >> > > Table 1 has an entry no, table id which links it to table 2 > >> > > In table 2 the dimension code values are listed per entry no - with a > >> > > corresponding dimension type > >> > > > >> > > And as you say I need to flatten this... > >> > > > >> > > So the dimension type values in table 2 need to be columns in a new > >> table > >> > > and in these columns the corresponding dimension code values. > >> > > > >> > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > >> > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... > >> > > > Can you post the definitions of the two tables and show how they > >> relate. > >> > > > Can you also post the definition of the fact table and how this > >> > > > relates to those tables. > >> > > > > >> > > > From what I can understand then is that there is a fact table file > >> > > > (4 > >> > > > million )and a dimensions table file (11 million) > >> > > > > >> > > > The dimensions table file is large because you have multiple rows > >> > > > in > >> > > > there for each fact table row telling you the dimension key value > >> > > > for > >> > > > that fact table row. > >> > > > > >> > > > You must then in the dimensions table file identify the dimension > >> itself > >> > > > and the value for that dimension > >> > > > > >> > > > Strange design but without the structure I would have to guess at > >> > > > say > >> > > > you have a dimensions table file that looks like this > >> > > > > >> > > > FactRowID, DimensionType, DimensionValue > >> > > > > >> > > > And your fact table file might look like > >> > > > > >> > > > FactRowID, val1, Val2.... > >> > > > > >> > > > Your end result is required to be > >> > > > > >> > > > FactRowID,DimensionName1, DimensionName2......... > >> > > > > >> > > > > >> > > > You basically need to flatten the dimensions table file. > >> > > > > >> > > > > >> > > > Am I any nearer? > >> > > > > >> > > > > >> > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in > >> > > > message > >> > > > news:michaelvardinghus@hotmail.com: > >> > > > > >> > > > > Its not dimension tables - it's a fact table with the > >> > > > > transactions > >> in > >> > > table > >> > > > > 1 and in table 2 there is the dimension specifications for each > >> > > transaction. > >> > > > > The reason for the larger number in table 2 is that > >> > > > > there is used multiple dimensions and for each dimension a new > >> > > > > line > >> is > >> > > made > >> > > > > but it is referring back to table 1 with only 1 field. > >> > > > > > >> > > > > So what I would like to do was to take table a which for instance > >> has 7 > >> > > > > columns and then for each dimension TYPE in table 2 I want to add > >> > > > > a > >> new > >> > > > > column and in these columns I would like > >> > > > > to write the corresponding dimension value from table 2. > >> > > > > > >> > > > > The tables are placed on a different server - and I would like to > >> avoid > >> > > to > >> > > > > do something on this server - would prefer to do it on my > >> > > > > "washing" > >> > > > > machine... > >> > > > > > >> > > > > About the surrogate keys that is a different matter - I will be > >> doing > >> > > that > >> > > > > later - after these two tables are merged. My problem is that I > >> don't > >> > > have > >> > > > > any dimension values in the first table yet - I need to put these > >> > > > > on > >> and > >> > > > > then afterwards replacing these with surrogate keys. > >> > > > > > >> > > > > Perhaps I could do something with partitions on table 2 based on > >> > > dimension > >> > > > > type ? Would that make the sentence below better ? > >> > > > > > >> > > > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message > >> > > > > news:#Dpm4lRWFHA.616@TK2MSFTNGP12.phx.gbl... > >> > > > > > Michael, > >> > > > > > > >> > > > > > Let me see if I have the question correct. You have 2 source > >> > > tables(11
The dimension tables are in place ... these are placed in other tables Yep refill the fact - that isn't so bad as transporting everything from host - or ? "Allan Mitchell" <allan@no-spam.sqldts.com> skrev i en meddelelse news:eQQ6VBVWFHA.612@TK2MSFTNGP12.phx.gbl... [quoted text, click to view] >I take it then you are looking to refill the Fact Table every night. > Are the dimension tables in place or are you looking to use the one table > as all your dimension tables? > > > > > > > > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message > news:michaelvardinghus@hotmail.com: > >> thanx again Allan >> >> So this is it >> >> Table 1 on a new base equals table 1 on transaction base >> Table 2 on a new base equals table 2 on trasaction base >> Table 3 on a new base is a dummy table and this is used like this (in >> this >> table I set 0 instead of null as default for dimension values) >> insert records from table 1 >> update dimension type a from table 2 >> update dimension type a from table 2 >> >> And with indexes on table 1 and 2 ? >> Should these indexes be removed when filling up from transaction base and >> applied when sending from 1/2 to 3 in new base ? >> I'm not that clever at indexing - only used the wizard so far - how would >> indexing look like - just index what I'm using in the where clauses ? >> >> >> "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message >> news:eJNUGRUWFHA.4076@TK2MSFTNGP14.phx.gbl... >> > So as Far as I can see what I might do is insert the fact rows into a >> > dummy fact table as is so NULLs for the dimensionIDs. >> > You pull all the dimension rows into a working table. >> > You can now update the dummy fact table with the Dimension IDs by >> > joining onto the fact table with the ID and the posting date. >> > >> > You would make multiple passes through the data based on DimensionID in >> > the dimensions table. >> > >> > The dimensions table would be indexed so I could retrieve easily and >> > efficiently the correct rows. >> > >> > If you do not want to do the processing on the remote server then >> > bringing all the rows over is certainly an option. >> > >> > Why are things held like this? Seems strange and inefficient. >> > >> > >> > Allan >> > >> > >> > >> > "Michael Vardinghus" <michaelvardinghus@hotmail.com> wrote in message >> > news:michaelvardinghus@hotmail.com: >> > >> > > Don't know if you can read the other message but >> > > >> > > Table 1 has an entry no, table id which links it to table 2 >> > > In table 2 the dimension code values are listed per entry no - with a >> > > corresponding dimension type >> > > >> > > And as you say I need to flatten this... >> > > >> > > So the dimension type values in table 2 need to be columns in a new >> table >> > > and in these columns the corresponding dimension code values. >> > > >> > > "Allan Mitchell" <allan@no-spam.sqldts.com> wrote in message >> > > news:OylKE8SWFHA.2664@TK2MSFTNGP15.phx.gbl... >> > > > Can you post the definitions of the two tables and show how they >> relate. >> > > > Can you also post the definition of the fact table and how this >> > > > relates to those tables. >> > > > >> > > > From what I can understand then is that there is a fact table file >> > > > (4 >> |