Details of the system:
The leagcy system sends us records flagged with "Add", "modify" or
"delete".
The purpose of these flags is self-explnatory. But the fun began when
we noticed that within same file , legacy system sends us "Add" and
then "Modify". Thus, we were left with no other option except to do
row-by-row processing.
We came up with the following logic:
a) If record‘s StatusFlag is ‘A' and record‘s key does not exist in
DataWareHouse's Table, then the record is inserted into
DataWareHouse's Table.
b) If record‘s StatusFlag is ‘A', but record‘s key exists in
DataWareHouse's Table, then the record is marked as invalid and will
be inserted into InvalidTable..
c) If record‘s StatusFlag is ‘M' and record‘s key exists in
DataWareHouse's Table and record is active, then the corresponding
record in DataWareHouse's Table will be updated.
d) If record‘s StatusFlag is ‘M' and record‘s key exists in
DataWareHouse's Table but record is inactive, then the record is
marked as invalid and will be inserted into InvalidTable.
e) If record‘s StatusFlag is ‘M' and record‘s key does not exist in
DataWareHouse's Table, then the record is marked as invalid and will
be inserted into InvalidTable.
f) If record‘s StatusFlag is ‘D' and record‘s key exists in
DataWareHouse's Table and record is active, then the corresponding
record in DataWareHouse's Table will be updated as inactive.
g) If record‘s StatusFlag is ‘D' and record‘s key exists in
DataWareHouse's Table but record is inactive, then the record is
marked as invalid and will be inserted into InvalidTable.
h) If record‘s StatusFlag is ‘D' and record‘s key does not exist in
DataWareHouse's Table, then the record is marked as invalid and will
be inserted into InvalidTable.
This logic takes care of ALL the anomalies we were facing before but
at the cost of long processing time.
I await your comments.
Thanks
[quoted text, click to view] Erland Sommarskog <sommar@algonet.se> wrote in message news:<Xns94F53BF51111Yazorman@127.0.0.1>...
> Muzamil (muzamil@hotmail.com) writes:
> > The row-by-row update is mandatory becuase the leagacy system is
> > sending us the information such as "Add", "Modify" or "delete" and
> > this information HAS to be processed in the same order otherwise we'll
> > get the erroneous data.
>
> Ouch. Life is cruel, sometimes.
>
> I wonder what possibilities there could be to find parallel streams,
> that is updates that could be performed independently. Maybe you
> can modify 10 rows at a time then. But it does not sound like a very
> easy thing to do.
>
> Without knowing the details of the system, it is difficult to give
> much advice. But any sort of pre-aggregation you can do, is probably
Muzamil (muzamil@hotmail.com) writes:
[quoted text, click to view] > Details of the system:
> The leagcy system sends us records flagged with "Add", "modify" or
> "delete".
> The purpose of these flags is self-explnatory. But the fun began when
> we noticed that within same file , legacy system sends us "Add" and
> then "Modify". Thus, we were left with no other option except to do
> row-by-row processing.
> We came up with the following logic:
Hm, you might be missing a few cases. What if you get an Add, and record
exists in DW, but is marked inactive? With your current logic, the
input record moved to the Invalid table.
And could that feediug system be as weird as to send Add, Modify, Delete,
and Add again? Well, for a robust solution this is what we should assume.
It's a tricky problem, and I was about to defer the problem, when I
recalled a solution that colleague did for one of our stored procedures.
The secret word for tonight is bucketing! Assuming that there are
only a couple of input records for each key value, this should be
an excellent solution. You create buckets, so that each bucket has
at most one row per key value. Here is an example on how to do it:
UPDATE inputtbl
SET bucket = (SELECT count(*)
FROM inputtbl b
WHERE a.keyval = b.keyval
AND a.rownumber < b.rownumber) + 1
FROM inputtbl a
input.keyval is the keys for the records in the DW table. Rownumber
is a column which as describes the processing order. I assume that
you have such a column.
So now you can iterate over the buckets, and for each bucket, you can do
set- based processing. You still have to iterate, but instead over 60000
rows, only over a couple of buckets.
--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se
Books Online for SQL Server SP3 at