Groups | Blog | Home
all groups > sql server dts > march 2005 >

sql server dts : Re: global variable needs to be set on each row


RayAll
3/6/2005 1:03:36 PM
Allan,

I have 33 fields in each record and I have to validate each field anyways
and log every single validation error in a seperate table(**NO DOUBT ABOUT
IT**).in my senario the only thing which is added is to check if the
status_code is not set before and if not set it to 1(which mean critical
error).I'd like to use this field for further processing because output of
this step would be input to another DTS package which dose the business
rules on the records which are ok (I mean there is no validation error
occured on them).I know there might be performance issue regarding this
scenario.But this proecess is supposed to be launched at midnigh and for
client what is important is not the performance of the system ,but the
ability to tell them what was wrong (in very detail) with their CSV file

Thanks and sorry for asking lots of questions.Now I have a very bright
underesanding of what I m going to accomplish.

I so appreciate your help.

Ray

shriop
3/6/2005 6:30:57 PM
I really think you should take this outside of dts. Normally when
people handles things like this from my experience and need to validate
data, they write some code, depending on your programming language of
choice, to read through the file, record by record, do the validation,
and then save the result to 1 or more other files, but probably just
one in your case. That way, you're not trying to do too complicated of
things inside dts, you have full control over what happens when you
encounter a problem, and you can clean everything up to just let dts
get your data in the database fast, which in my opinion is really what
it's ideal for.
RayAll
3/7/2005 9:27:05 AM
Yes,you are right up to a point .But my case is a little bit different with
just simply parsing the CSV file and making a Dataset.

I have 16 Lookup Tables that need to be accessible through the
Transformation. You are not saying that I shouldn't use DTS for
Transformation at ll,do you?
I need to do validation + Business rules on each record which is very
difficult to be done through ADO for insatcne ,I am doing the business rules
simply by using a Trigger or a Trnasactional stored procedure and I don't
transfer any of lookup tables or tables are used for business rules to the
client ,everything is checked on the server.Don't you think that it has a
big perfomance advantage to load a table with 2 thousand records to Dataset
(or whatever) to check the business rules? On the other hand ,if you decide
to do everything outside DTS when your validation is done ,your business
rules is done ,I need another mechanism to split records into different
Tables.Is there a better mechanism tham using lookups in a Data pump task to
do so? I do agree that I don't have that much flexibility working with DTS
but it saves me from thousands lines of codes and the time.

Anyways thanks for your hint.it was the idea that I came up with at the
beginning but ,my businedd rules are very complicated and transformation is
also very tricky ,so I decided to use DTS and so far ,I'm pretty satisfied
with tthe overall progress (except some little issues line whay I posted
below) which I think could be solved by some tricks which I am not aware of.

Thanks for your help,

Ray
[quoted text, click to view]

shriop
3/7/2005 10:09:27 AM
That does complicate things. Can you give me rough estimates of the row
counts of the 16 tables plus the row count of the data you're
importing? Also, the 16 tables are solely being used for lookups to
find corresponding foreign key values that relate to passed in string
values in the csv correct?

Also, you're saying that you need a mechnism to split records into
different tables. Are you just referring to the valid table plus the
errors table?

Making A LOT of assumptions here, and very well may be revised based on
answers to the above questions, here's what I would do personally,
keeping in mind that I'm a programmer, so code doesn't bother me, but
might bother you. I think it would result in the best control, the best
functionality, and the best efficiency/speed.

Get a fast csv parser, which I sell a fast cheap one, if you need.
Assuming the 16 tables are hardly ever edited, and reasonably small
enough to sit in RAM, I'd load those tables into static HashTable's in
C#, using the name for the key, and primary key value as the value.
You'd only have a one time hit for the load, and fast lookups after
loaded. I'd then loop over the source file. Find the corresponding
values from the 16 lookups using the HashTables. Then, run your
business logic. If any of this fails, send the row to a bad data csv
file. If the row passes all checks, send the row to a good data csv
file. When you're done processing, run a dts package with one data pump
task of the good data csv file into the final table, or a loading
table. There'd also be another data pump task of the bad data csv file
into your errors table. Obviously, this is a very specific solution,
but it's how I'd go about it.

If this is something horribly important and going to be run a lot in
your company, I'd try multiple ways of doing it, to see the
differences. Triggers and Stored Procedure calls and Lookup calls in
the middle of your DTS data pump task is going to make this process
crawl in comparison to what I suggested above, but you don't have to
take my word for it. I'd really recommend very simple test scenarios so
you get comfortable with your tradeoffs and limitations.
RayAll
3/7/2005 11:57:09 AM

[quoted text, click to view]

Fortunately for my lookups (in validation process) ,tables have maximum of 6
records(it's not a big deal). When the validation is done I need to make
sure that that record dosn't exist in two tables(Table1 and Table2.this part
is my business ruke then) with about 300,000 records each(600,000 records in
total).You don't tell me that I should reconnect to database from client or
load that table into client as a lookup?:-)

[quoted text, click to view]
Look,I use the error Table only for validation process,if there is any
problem during the validation for each record ,I won't check the business
rules.after I have done validation and bizrules I have to send good records
to two different table (some fields of each record gose to one Table and
some fields of each reocrd gose to another Table)

[quoted text, click to view]

I'm programmer too,as I told you I implemented the same scenario using a CSV
parser and finally I ended up myself in a situation that I have to use DTS
in order to transfer CSV file records into Database,so I thought if I'm
supposed to use DTS anyways ,why not having everything in DTS ,since it's
much easier to be deployed than having one peice as a webservice for
instance and a DTS package besides that.


[quoted text, click to view]

It's exactly what I did before,my problem is Business rules.How are yuo
going to enforce them ??using an stored procedure??? using direct connection
to Database,you won't probably load 600,000 records to your dataset to check
the business rules? so you kind of have to check the business rules in
server,right? then you end up yuorself having designed your suystem with a
different components ,one client ,one good cvs file,one bad CSV file,one
stored procedure for business rules and one DTS package.Don't you think it
makes a it really deficult for the tester and operation department to follow
all these rules.but in my scenarion,I have only one DTS package (AS I said
before,there is no flexibility(which I don't care because I got what I
wanted from DTS,except a few thing ,there might be performance issue (which
I don't think is worse than client appraoch)


[quoted text, click to view]

oh,I so appreciate your time,you have spent at least five minutes to solve
my problem where you didn't have to :-),I really appreciate it ,we want to
disguss this issue and it gives me the opportunity to make a good decision
and for the others with the same problem a good underestanding ;-) I still
didn't underestand how you want to enforce business rules..you may answer
based on the information I gave you above.

Thanks very much for your nice reply,

Ray

AddThis Social Bookmark Button