sql server dts:
Personally I would first load it into a staging table, e.g. STG_SOURCE with 1 columns called SOURCEFIELD of NVARCHAR(1000) So, only using the datapump's Copy Column function. Or even better, do a BULK INSERT. Then you would have one table with one field with 56,171,938 records. After that, do your cleansing, e.g. with a massive SQL insert statement into a second table with the proper columns. Let that statement perform your calculations, etc.
I am importing a flat file consisting of variable length records terminated by {CR}{LF}. The record sizes vary between 65 bytes and 79 bytes (the last field is a variable length name field). I am reading in the record as a single field and then using a modification to the Transform VBScript to load the database. The DTS process has been running about 12 hours and has processed 4,521,000 rows (according to the DTS process status) resulting in a database mdf having a size of about 378,496 KB. The source file is a text file whose size is 3,651,176 KB. If each source record averages about 65 bytes, the file contains 56,171,938 records. At this rate it will take 6 days to load the database which will be about 4GB in size. I have looked at the initial data being loaded and it looks correct (compared to the source records). Can anyone please suggest anything I can do to speed up this load? Thanks. '********************************************************************** ' Visual Basic Transformation Script ' Copy each source column to the ' destination column '************************************************************************ 'Put in Debug Loop Public Counter Function Main() Dim Rec, Amt, Dig, NDig, Neg, Amt2 If isNull(Counter) Then Counter = 1 Rec = DTSSource("Col001") DTSDestination("exp_big_RecordType") = Mid(Rec,1,1) DTSDestination("exp_big_BatchNum") = Mid(Rec,2,2) DTSDestination("exp_big_SupplierNum") = Mid(Rec,4,5) DTSDestination("exp_big_VouchNum") = Mid(Rec,9,6) DTSDestination("exp_big_InvoiceNum") = Mid(Rec,15,8) DTSDestination("exp_big_AcctNum") = Mid(Rec,23,4) DTSDestination("exp_big_StoreNum") = Mid(Rec,27,3) DTSDestination("exp_big_Date") = Mid(Rec,50,8) DTSDestination("exp_big_SupplierName") = Mid(Rec,58,23) Amt = Mid(Rec,30,8) Dig = Right(Amt,1) Neg = 1 Select Case Dig Case "@" NDig = "0" Case "A" NDig = "1" Case "B" NDig = "2" Case "C" NDig = "3" Case "D" NDig = "4" Case "E" NDig = "5" Case "F" NDig = "6" Case "G" NDig = "7" Case "H" NDig = "8" Case "I" NDig = "9" Case "P" NDig = "0" Neg = -1 Case "Q" NDig = "1" Neg = -1 Case "R" NDig = "2" Neg = -1 Case "S" NDig = "3" Neg = -1 Case "T" NDig = "4" Neg = -1 Case "U" NDig = "5" Neg = -1 Case "V" NDig = "6" Neg = -1 Case "W" NDig = "7" Neg = -1 Case "X" NDig = "8" Neg = -1 Case "Y" NDig = "9" Neg = -1 End Select Amt = Left(Amt,7) & NDig Amt2 = Round(((Amt * Neg) /100),2) DTSDestination("exp_big_Amount") = Amt2 Main = DTSTransformStat_OK
[quoted text, click to view] > First, unless the input is unicode I would use VARCHAR rather than > NVARCHAR. NVARCHAR would take twice the space, rather costly on a > table of this size if not required.
I don't know his source, so this is the safest. [quoted text, click to view] > would VERY much want to be able to do that in batches, and having an > identity column would make working in batches trivial.
Batches would indeed be better. If a batch fails, just pickup where you left off. You're right, just wanted to keep this as simple as possible. [quoted text, click to view] > And on the subject of working in batches, if the entire table were > transformed in one pass it would require double storage space. If > done in batches - say 100,000 rows at a time, to pick a number out of > the air - the processed rows for a batch could be deleted from the > staging table after the batch is loaded. Space released from the > staging table would become available for the real table.
Space is not released to the OS if these actions are logged. It is not recommended to turn on "auto-shrink" for a database, so the db would no shrink/remain the same, but still increase in your scenario. In the same order as in my scenario. Ofcourse, in my scenario you could truncate the staging after you're finished. > 1000x faster than deleting 100000 rows after each batch. I even expect the deletes would turn him back to his original scenario, that slow it is. Thanks for your addition to the tread. Greetings,
That sounds like a workable alternative to me, though I would like to add a couple of minor points. First, unless the input is unicode I would use VARCHAR rather than NVARCHAR. NVARCHAR would take twice the space, rather costly on a table of this size if not required. Second, I suggest that it be a two-column table, an INT identity column and the (N)VARCHAR column. This could be a major advantage when it comes to loading the real table from the staging table. I would VERY much want to be able to do that in batches, and having an identity column would make working in batches trivial. And on the subject of working in batches, if the entire table were transformed in one pass it would require double storage space. If done in batches - say 100,000 rows at a time, to pick a number out of the air - the processed rows for a batch could be deleted from the staging table after the batch is loaded. Space released from the staging table would become available for the real table. Roy Harvey Beacon Falls, CT On Thu, 13 Jul 2006 07:43:02 -0700, Frans van Bree [quoted text, click to view] <FransvanBree@discussions.microsoft.com> wrote: >Personally I would first load it into a staging table, e.g. STG_SOURCE with 1 >columns called SOURCEFIELD of NVARCHAR(1000) So, only using the datapump's >Copy Column function. Or even better, do a BULK INSERT. Then you would have >one table with one field with 56,171,938 records. > >After that, do your cleansing, e.g. with a massive SQL insert statement into >a second table with the proper columns. Let that statement perform your >calculations, etc. >
[quoted text, click to view] > referring to releasing space back to the OS, just freeing up space > within the database for the other table.
You're right. I was thinking of that after I sent the post. I am still thinking on doing a test to see it the database file (MDF) will or will not increase in your scenario if auto-shrink is off. Because I am not so sure after all that it will increase (that much). {Depending of course on the how much bytes there are extra in the fields of the second table, because the
Thanks to both of you for your ideas. I will go give it a whirl and see what I can make of it. [quoted text, click to view] On Thu, 13 Jul 2006 11:00:09 -0400, Roy Harvey wrote: > That sounds like a workable alternative to me, though I would like to > add a couple of minor points. > > First, unless the input is unicode I would use VARCHAR rather than > NVARCHAR. NVARCHAR would take twice the space, rather costly on a > table of this size if not required. > > Second, I suggest that it be a two-column table, an INT identity > column and the (N)VARCHAR column. This could be a major advantage > when it comes to loading the real table from the staging table. I > would VERY much want to be able to do that in batches, and having an > identity column would make working in batches trivial. > > And on the subject of working in batches, if the entire table were > transformed in one pass it would require double storage space. If > done in batches - say 100,000 rows at a time, to pick a number out of > the air - the processed rows for a batch could be deleted from the > staging table after the batch is loaded. Space released from the > staging table would become available for the real table. > > Roy Harvey > Beacon Falls, CT > > On Thu, 13 Jul 2006 07:43:02 -0700, Frans van Bree > <FransvanBree@discussions.microsoft.com> wrote: > >>Personally I would first load it into a staging table, e.g. STG_SOURCE with 1 >>columns called SOURCEFIELD of NVARCHAR(1000) So, only using the datapump's >>Copy Column function. Or even better, do a BULK INSERT. Then you would have >>one table with one field with 56,171,938 records. >> >>After that, do your cleansing, e.g. with a massive SQL insert statement into >>a second table with the proper columns. Let that statement perform your >>calculations, etc. >>
On Thu, 13 Jul 2006 10:13:01 -0700, Frans van Bree [quoted text, click to view] <FransvanBree@discussions.microsoft.com> wrote: >> And on the subject of working in batches, if the entire table were >> transformed in one pass it would require double storage space. If >> done in batches - say 100,000 rows at a time, to pick a number out of >> the air - the processed rows for a batch could be deleted from the >> staging table after the batch is loaded. Space released from the >> staging table would become available for the real table. > >Space is not released to the OS if these actions are logged. It is not >recommended to turn on "auto-shrink" for a database, so the db would no >shrink/remain the same, but still increase in your scenario. In the same >order as in my scenario. Ofcourse, in my scenario you could truncate the >staging after you're finished. > 1000x faster than deleting 100000 rows after >each batch. I even expect the deletes would turn him back to his original >scenario, that slow it is.
Good point on the performance of the DELETEs. However, I was not referring to releasing space back to the OS, just freeing up space within the database for the other table.
On Thu, 13 Jul 2006 11:15:02 -0700, Frans van Bree [quoted text, click to view] <FransvanBree@discussions.microsoft.com> wrote: >> referring to releasing space back to the OS, just freeing up space >> within the database for the other table. > >You're right. I was thinking of that after I sent the post. I am still >thinking on doing a test to see it the database file (MDF) will or will not >increase in your scenario if auto-shrink is off. Because I am not so sure >after all that it will increase (that much). {Depending of course on the how >much bytes there are extra in the fields of the second table, because the >transformations can create additional data.}
There is another factor to consider about SQL Server re-using freed pages. It takes time for the pages to be returned to the pool. That process is done asyncronously, as the system has time for it, so it might not actually be back in the pool and available if needed right away.
[quoted text, click to view] On Thu, 13 Jul 2006 13:17:43 -0400, Roy Harvey wrote: > On Thu, 13 Jul 2006 10:13:01 -0700, Frans van Bree > <FransvanBree@discussions.microsoft.com> wrote: > >>> And on the subject of working in batches, if the entire table were >>> transformed in one pass it would require double storage space. If >>> done in batches - say 100,000 rows at a time, to pick a number out of >>> the air - the processed rows for a batch could be deleted from the >>> staging table after the batch is loaded. Space released from the >>> staging table would become available for the real table. >> >>Space is not released to the OS if these actions are logged. It is not >>recommended to turn on "auto-shrink" for a database, so the db would no >>shrink/remain the same, but still increase in your scenario. In the same >>order as in my scenario. Ofcourse, in my scenario you could truncate the >>staging after you're finished. > 1000x faster than deleting 100000 rows after >>each batch. I even expect the deletes would turn him back to his original >>scenario, that slow it is. > > Good point on the performance of the DELETEs. However, I was not > referring to releasing space back to the OS, just freeing up space > within the database for the other table. > > Roy
A couple of update comments about my big file processing. CREATE TABLE Stage_Big ( -- stg_big_Id_Num int Identity (1,1), stg_big_Record varchar (200) Null ) GO Doing the identity column seems to add additional seconds to the process. Further, another problem exists which I will mention below. This BULK INSERT takes 3.24 minutes (1 M of rows = 3.24 minutes). That means I can insert my 57 Meg of records in about 3 hours. A definite improvement, but I have not yet worked on the script to convert it to useable columns. BULK INSERT Exp_Big.dbo.[Stage_Big] FROM 'C:\Work\Exp_Big.asc' WITH ( -- FORMATFILE = 'c:\work\exp_big.fmt', BATCHSIZE = 10000, FIRSTROW = 1, LASTROW = 1000000) Go One must use a format file if the Identity column is defined since the source data does not have an identity field. 8.0 2 1 SQLCHAR 0 0 "" 0 stg_big_Id_Num SQL_Latin1_General_Cp437_BIN 2 SQLCHAR 0 200 "\r\n" 2 stg_big_Record SQL_Latin1_General_Cp437_BIN Question: The log file starts to get really big. I looked for a way to compact the data/log file but did not find anything. Does a way exist to
[quoted text, click to view] On Sat, 15 Jul 2006 10:20:58 -0500, MikeV06 <me@privacy.net> wrote: >A couple of update comments about my big file processing. >One must use a format file if the Identity column is defined since the >source data does not have an identity field.
Or, create a view that excludes the identity column and load to the view name. [quoted text, click to view] >Question: The log file starts to get really big. I looked for a way to >compact the data/log file but did not find anything. Does a way exist to >force an empty of the log file and then a compression?
For a process like this I would generally want the recovery model for the database set to Simple. This will allow the log to be truncated after each batch - each 10,000 rows in your sample code. That assumes that the database is not being used for anything else at the time which requires maintaining logs, and given what you are doing that would be a really good idea in any case. It would also be a good idea, before the mass load, to make sure the database size is set large enough, or that the increment of growth is something large. Roy Harvey
[quoted text, click to view] On Sat, 15 Jul 2006 11:33:35 -0400, Roy Harvey wrote: > On Sat, 15 Jul 2006 10:20:58 -0500, MikeV06 <me@privacy.net> wrote: >>A couple of update comments about my big file processing. > >>One must use a format file if the Identity column is defined since the >>source data does not have an identity field. > > Or, create a view that excludes the identity column and load to the > view name.
Yes, I shoulda thought of that; thank you. [quoted text, click to view] >>Question: The log file starts to get really big. I looked for a way to >>compact the data/log file but did not find anything. Does a way exist to >>force an empty of the log file and then a compression? > > For a process like this I would generally want the recovery model for > the database set to Simple. This will allow the log to be truncated > after each batch - each 10,000 rows in your sample code. That assumes > that the database is not being used for anything else at the time > which requires maintaining logs, and given what you are doing that > would be a really good idea in any case.
Simple is exactly what I needed. Again, thank you. [quoted text, click to view] > It would also be a good idea, before the mass load, to make sure the > database size is set large enough, or that the increment of growth is > something large.
Have it set to unlimited. [quoted text, click to view] > Roy Harvey > Beacon Falls, CT
[quoted text, click to view] On Sat, 15 Jul 2006 11:50:25 -0500, MikeV06 <me@privacy.net> wrote: >> It would also be a good idea, before the mass load, to make sure the >> database size is set large enough, or that the increment of growth is >> something large. > >Have it set to unlimited.
The max size can be unlimited, but the initial allocation might not be large, and the unit of growth must be finite. It is set for each file, either as a percent or a number of megabytes.
[quoted text, click to view] On Sat, 15 Jul 2006 15:15:24 -0400, Roy Harvey wrote: > On Sat, 15 Jul 2006 11:50:25 -0500, MikeV06 <me@privacy.net> wrote: > >>> It would also be a good idea, before the mass load, to make sure the >>> database size is set large enough, or that the increment of growth is >>> something large. >> >>Have it set to unlimited. > > The max size can be unlimited, but the initial allocation might not be > large, and the unit of growth must be finite. It is set for each > file, either as a percent or a number of megabytes. > > Roy
Gotcha. Mine was set to 10%. I have it set to 25% now. Based on my last set of computations, it will now take about *24 hours* rather than 6-7 days to load the final table. If anyone sees anything else I should do to speed things up, please tell me. Thanks. To summarize what I have done thus far: To load ~47M records into the stage table using a view required a little over an hour: CREATE TABLE Stage_Big ( stg_big_Id_Num int Identity (1,1), stg_big_Record varchar (200) Null ) I then indexed stg_big_Id_Num which took about 15 minutes. This required 3 minutes INSERT INTO Expenses_Big (exp_big_RecordType, exp_big_BatchNum, exp_big_SupplierNum, exp_big_VouchNum, exp_big_InvoiceNum, exp_big_AcctNum, exp_big_StoreNum, exp_big_Date, exp_big_SupplierName, exp_big_Amount) SELECT SUBSTRING(stg_big_Record, 1, 1), SUBSTRING(stg_big_Record, 2, 2), SUBSTRING(stg_big_Record, 4, 5), SUBSTRING(stg_big_Record, 9, 6), SUBSTRING(stg_big_Record, 15, 8), SUBSTRING(stg_big_Record, 23, 4), SUBSTRING(stg_big_Record, 27, 3), SUBSTRING(stg_big_Record, 50, 8), SUBSTRING(stg_big_Record, 58, 23), dbo.fnComputeAmt(SUBSTRING(stg_big_Record, 30, 8)) FROM Stage_Big WHERE stg_big_Id_Num < 100000 Hence, it will take *24* hours to load the entire expense table. Better than 6-7 days, but still a long time. The function is CREATE FUNCTION fnComputeAmt (@Amt_string AS char (8)) RETURNS money AS BEGIN DECLARE @Neg INT, @DIG char (1), @Ndig char (1), @AmtMoney money SET @Dig = Right(@Amt_string,1) SET @Neg = 1 IF @Dig BETWEEN 'P' and 'Y' BEGIN SET @Neg = -1 END SELECT @Ndig = CASE @DIG WHEN '@' THEN '0' WHEN 'A' THEN '1' WHEN 'B' THEN '2' WHEN 'C' THEN '3' WHEN 'D' THEN '4' WHEN 'E' THEN '5' WHEN 'F' THEN '6' WHEN 'G' THEN '7' WHEN 'H' THEN '8' WHEN 'I' THEN '9' WHEN 'P' THEN '0' WHEN 'Q' THEN '1' WHEN 'R' THEN '2' WHEN 'S' THEN '3' WHEN 'T' THEN '4' WHEN 'U' THEN '5' WHEN 'V' THEN '6' WHEN 'W' THEN '7' WHEN 'X' THEN '8' WHEN 'Y' THEN '9' END SET @Amt_string = LEFT(@Amt_string,7) + @Ndig SET @AmtMoney = Round( CONVERT(Money, @Amt_string) * @Neg /100, 2) RETURN @AmtMoney
You might get a significant improvement if you can eliminate the function call. Maybe add a simple table to map the @DIG to @Ndig and then make the rest as inline sql? Paul Shapiro [quoted text, click to view] "MikeV06" <me@privacy.net> wrote in message news:1lgrcd3kmxqqw.dlg@mycomputer06.invalid.com... > On Sat, 15 Jul 2006 15:15:24 -0400, Roy Harvey wrote: > >> On Sat, 15 Jul 2006 11:50:25 -0500, MikeV06 <me@privacy.net> wrote: >> >>>> It would also be a good idea, before the mass load, to make sure the >>>> database size is set large enough, or that the increment of growth is >>>> something large. >>> >>>Have it set to unlimited. >> >> The max size can be unlimited, but the initial allocation might not be >> large, and the unit of growth must be finite. It is set for each >> file, either as a percent or a number of megabytes. >> >> Roy > > Gotcha. Mine was set to 10%. I have it set to 25% now. > > Based on my last set of computations, it will now take about *24 hours* > rather than 6-7 days to load the final table. If anyone sees anything else > I should do to speed things up, please tell me. > > Thanks. > > To summarize what I have done thus far: > > To load ~47M records into the stage table using a view required a little > over an hour: > > CREATE TABLE Stage_Big ( > stg_big_Id_Num int Identity (1,1), > stg_big_Record varchar (200) Null > ) > > I then indexed stg_big_Id_Num which took about 15 minutes. > > This required 3 minutes > > INSERT INTO Expenses_Big > (exp_big_RecordType, exp_big_BatchNum, exp_big_SupplierNum, > exp_big_VouchNum, > exp_big_InvoiceNum, exp_big_AcctNum, exp_big_StoreNum, > exp_big_Date, exp_big_SupplierName, > exp_big_Amount) > SELECT SUBSTRING(stg_big_Record, 1, 1), > SUBSTRING(stg_big_Record, 2, 2), > SUBSTRING(stg_big_Record, 4, 5), > SUBSTRING(stg_big_Record, 9, 6), > SUBSTRING(stg_big_Record, 15, 8), > SUBSTRING(stg_big_Record, 23, 4), > SUBSTRING(stg_big_Record, 27, 3), > SUBSTRING(stg_big_Record, 50, 8), > SUBSTRING(stg_big_Record, 58, 23), > dbo.fnComputeAmt(SUBSTRING(stg_big_Record, 30, 8)) > FROM Stage_Big > WHERE stg_big_Id_Num < 100000 > > Hence, it will take *24* hours to load the entire expense table. Better > than 6-7 days, but still a long time. > > The function is > > CREATE FUNCTION fnComputeAmt (@Amt_string AS char (8)) > RETURNS money > AS > BEGIN > DECLARE @Neg INT, > @DIG char (1), > @Ndig char (1), > @AmtMoney money > > SET @Dig = Right(@Amt_string,1) > SET @Neg = 1 > IF @Dig BETWEEN 'P' and 'Y' > BEGIN > SET @Neg = -1 > END > SELECT @Ndig = > CASE @DIG > WHEN '@' THEN '0' > WHEN 'A' THEN '1' > WHEN 'B' THEN '2' > WHEN 'C' THEN '3' > WHEN 'D' THEN '4' > WHEN 'E' THEN '5' > WHEN 'F' THEN '6' > WHEN 'G' THEN '7' > WHEN 'H' THEN '8' > WHEN 'I' THEN '9' > WHEN 'P' THEN '0' > WHEN 'Q' THEN '1' > WHEN 'R' THEN '2' > WHEN 'S' THEN '3' > WHEN 'T' THEN '4' > WHEN 'U' THEN '5' > WHEN 'V' THEN '6' > WHEN 'W' THEN '7' > WHEN 'X' THEN '8' > WHEN 'Y' THEN '9' > END > SET @Amt_string = LEFT(@Amt_string,7) + @Ndig > SET @AmtMoney = Round( CONVERT(Money, @Amt_string) * @Neg /100, 2) > > RETURN @AmtMoney > END
On Sun, 16 Jul 2006 11:09:46 -0400, "Paul Shapiro" [quoted text, click to view] <paul@hideme.broadwayData.com> wrote: >You might get a significant improvement if you can eliminate the function >call.
I agree it is worth trying. I think thes expression below does the job to resolve to the string expression that needs to be converted/rounded. CASE WHEN SUBSTRING(stg_big_Record, 37, 1) BETWEEN 'P' and 'Y' THEN '-' ELSE '0' END + SUBSTRING(stg_big_Record, 30, 7) + CASE SUBSTRING(stg_big_Record, 37, 1) WHEN '@' THEN '0' WHEN 'A' THEN '1' WHEN 'B' THEN '2' WHEN 'C' THEN '3' WHEN 'D' THEN '4' WHEN 'E' THEN '5' WHEN 'F' THEN '6' WHEN 'G' THEN '7' WHEN 'H' THEN '8' WHEN 'I' THEN '9' WHEN 'P' THEN '0' WHEN 'Q' THEN '1' WHEN 'R' THEN '2' WHEN 'S' THEN '3' WHEN 'T' THEN '4' WHEN 'U' THEN '5' WHEN 'V' THEN '6' WHEN 'W' THEN '7' WHEN 'X' THEN '8' WHEN 'Y' THEN '9' END Another thought is to create a view on Stage_Big that handles all the substrings and just leaves you with strings to convert. The expression above could be used in the view. And it is worth mentioning that the MONEY data type is generally to be avoided. DECIMAL is much preferred. Roy Harvey Beacon Falls, CT [quoted text, click to view] On Sat, 15 Jul 2006 23:29:40 -0500, MikeV06 <me@privacy.net> wrote: >On Sat, 15 Jul 2006 15:15:24 -0400, Roy Harvey wrote: > >> On Sat, 15 Jul 2006 11:50:25 -0500, MikeV06 <me@privacy.net> wrote: >> >>>> It would also be a good idea, before the mass load, to make sure the >>>> database size is set large enough, or that the increment of growth is >>>> something large. >>> >>>Have it set to unlimited. >> >> The max size can be unlimited, but the initial allocation might not be >> large, and the unit of growth must be finite. It is set for each >> file, either as a percent or a number of megabytes. >> >> Roy > >Gotcha. Mine was set to 10%. I have it set to 25% now. > >Based on my last set of computations, it will now take about *24 hours* >rather than 6-7 days to load the final table. If anyone sees anything else >I should do to speed things up, please tell me. > >Thanks. > >To summarize what I have done thus far: > >To load ~47M records into the stage table using a view required a little >over an hour: > >CREATE TABLE Stage_Big ( > stg_big_Id_Num int Identity (1,1), > stg_big_Record varchar (200) Null >) > >I then indexed stg_big_Id_Num which took about 15 minutes. > >This required 3 minutes > >INSERT INTO Expenses_Big > (exp_big_RecordType, exp_big_BatchNum, exp_big_SupplierNum, >exp_big_VouchNum, > exp_big_InvoiceNum, exp_big_AcctNum, exp_big_StoreNum, > exp_big_Date, exp_big_SupplierName, > exp_big_Amount) >SELECT SUBSTRING(stg_big_Record, 1, 1), > SUBSTRING(stg_big_Record, 2, 2), > SUBSTRING(stg_big_Record, 4, 5), > SUBSTRING(stg_big_Record, 9, 6), > SUBSTRING(stg_big_Record, 15, 8), > SUBSTRING(stg_big_Record, 23, 4), > SUBSTRING(stg_big_Record, 27, 3), > SUBSTRING(stg_big_Record, 50, 8), > SUBSTRING(stg_big_Record, 58, 23), > dbo.fnComputeAmt(SUBSTRING(stg_big_Record, 30, 8)) >FROM Stage_Big >WHERE stg_big_Id_Num < 100000 > >Hence, it will take *24* hours to load the entire expense table. Better >than 6-7 days, but still a long time. > >The function is > >CREATE FUNCTION fnComputeAmt (@Amt_string AS char (8)) >RETURNS money >AS >BEGIN >DECLARE @Neg INT, > @DIG char (1), > @Ndig char (1), > @AmtMoney money > >SET @Dig = Right(@Amt_string,1) >SET @Neg = 1 >IF @Dig BETWEEN 'P' and 'Y' > BEGIN > SET @Neg = -1 > END >SELECT @Ndig = > CASE @DIG > WHEN '@' THEN '0' > WHEN 'A' THEN '1' > WHEN 'B' THEN '2' > WHEN 'C' THEN '3' > WHEN 'D' THEN '4' > WHEN 'E' THEN '5' > WHEN 'F' THEN '6' > WHEN 'G' THEN '7' > WHEN 'H' THEN '8' > WHEN 'I' THEN '9' > WHEN 'P' THEN '0' > WHEN 'Q' THEN '1' > WHEN 'R' THEN '2' > WHEN 'S' THEN '3' > WHEN 'T' THEN '4' > WHEN 'U' THEN '5' > WHEN 'V' THEN '6' > WHEN 'W' THEN '7' > WHEN 'X' THEN '8' > WHEN 'Y' THEN '9' > END >SET @Amt_string = LEFT(@Amt_string,7) + @Ndig >SET @AmtMoney = Round( CONVERT(Money, @Amt_string) * @Neg /100, 2) > >RETURN @AmtMoney
Mike, Good to see there is so much response to your question! Still, the major part of the performance impact in your proces is in the BULK INSERT step. I think you can optimize it considerable using the TABLOCK hint in the WITH-clause. Then, there are no more row-level locks AND the process will be minimally logged in your transaction log file. From BOL: "To help prevent the transaction log from running out of space, a minimally logged bulk copy can be performed if all of these conditions are met: The recovery model is simple or bulk-logged. The target table is not being replicated. The target table does not have any triggers. The target table has either 0 rows or no indexes.
[quoted text, click to view] On Sun, 16 Jul 2006 11:09:46 -0400, Paul Shapiro wrote: > You might get a significant improvement if you can eliminate the function > call. Maybe add a simple table to map the @DIG to @Ndig and then make the > rest as inline sql? > Paul Shapiro
Always hard to know if a function is faster/slower than a lookup table. I will try some timing runs to see if I can determine which is the case for this data.
[quoted text, click to view] On Sun, 16 Jul 2006 12:46:38 -0400, Roy Harvey wrote: > On Sun, 16 Jul 2006 11:09:46 -0400, "Paul Shapiro" > <paul@hideme.broadwayData.com> wrote: > >>You might get a significant improvement if you can eliminate the function >>call. > > I agree it is worth trying. I think thes expression below does the > job to resolve to the string expression that needs to be > converted/rounded. > > CASE WHEN SUBSTRING(stg_big_Record, 37, 1) BETWEEN 'P' and 'Y' > THEN '-' > ELSE '0' > END +
Very nice, thank you. [quoted text, click to view] > SUBSTRING(stg_big_Record, 30, 7) + > CASE SUBSTRING(stg_big_Record, 37, 1) > WHEN '@' THEN '0' > WHEN 'A' THEN '1' > WHEN 'B' THEN '2' > WHEN 'C' THEN '3' > WHEN 'D' THEN '4' > WHEN 'E' THEN '5' > WHEN 'F' THEN '6' > WHEN 'G' THEN '7' > WHEN 'H' THEN '8' > WHEN 'I' THEN '9' > WHEN 'P' THEN '0' > WHEN 'Q' THEN '1' > WHEN 'R' THEN '2' > WHEN 'S' THEN '3' > WHEN 'T' THEN '4' > WHEN 'U' THEN '5' > WHEN 'V' THEN '6' > WHEN 'W' THEN '7' > WHEN 'X' THEN '8' > WHEN 'Y' THEN '9' > END > > Another thought is to create a view on Stage_Big that handles all the > substrings and just leaves you with strings to convert. The > expression above could be used in the view.
Uhm, that is useful to try as well. [quoted text, click to view] > And it is worth mentioning that the MONEY data type is generally to be > avoided. DECIMAL is much preferred.
I had been trying both. Why is this so?
[quoted text, click to view] On Mon, 17 Jul 2006 01:31:01 -0700, Frans van Bree wrote: > Mike, > > Good to see there is so much response to your question! Still, the major > part of the performance impact in your proces is in the BULK INSERT step. I > think you can optimize it considerable using the TABLOCK hint in the > WITH-clause. Then, there are no more row-level locks AND the process will be > minimally logged in your transaction log file. > > From BOL: > "To help prevent the transaction log from running out of space, a minimally > logged bulk copy can be performed if all of these conditions are met: > The recovery model is simple or bulk-logged. > The target table is not being replicated. > The target table does not have any triggers. > The target table has either 0 rows or no indexes. > The TABLOCK hint is specified. "
Off to BOL I must go.
[quoted text, click to view] On Thu, 20 Jul 2006 19:54:53 -0500, MikeV06 <me@privacy.net> wrote: >> And it is worth mentioning that the MONEY data type is generally to be >> avoided. DECIMAL is much preferred. > >I had been trying both. Why is this so?
MONEY is not a standard SQL datatype, and I have heard there are problems with rounding. Use of the MONEY datatype goes back to the days before SQL Server had DECIMAL or NUMERIC, the behavior of which follows industry standards.
Don't see what you're looking for? Try a search.
|