Groups | Blog | Home
all groups > sql server (microsoft) > june 2005 >

sql server (microsoft) : An interesting set-based problem - pairing lines together based on count


David Portas
6/8/2005 12:00:00 AM
Is this an example of a real problem? If this is a realistic representation
of your data then my solution would be to fix the design. Delete the
duplicated data and add a natural key. Of course, it may be that your
example isn't realistic in which case it would help us answer you better if
you could correct it.

--
David Portas
SQL Server MVP
--

Brett Gerhardi
6/8/2005 12:00:00 AM
Thanks for the help Ross

I came up with the similar idea, but instead of self joining I used the
following technique:

select *
from
(
select b.CustomerID, b.ID, 'B' as BID,
(
select count(*)
from TestCountJoin b2
where b2.CustomerID = b.CustomerID
and b2.ItemCode = 'B'
and b2.ID <= b.ID
) as seq
from
(
-- get all Bs
select distinct b.CustomerID, b.ID
from TestCountJoin b
where b.ItemCode = 'B'
) b
) b
inner join
(
select a.CustomerID, a.ID, 'A' as BID,
(
select count(*)
from TestCountJoin a2
where a2.CustomerID = a.CustomerID
and a2.ItemCode = 'A'
and a2.ID <= a.ID
) as seq
from
(
-- get all Bs
select distinct b.CustomerID, b.ID
from TestCountJoin b
where b.ItemCode = 'A'
) a
) a on a.CustomerID = b.CustomerID
and a.Seq = b.Seq


do you think there would be any performance benifit between either method?

Thanks again
-=- Brett

[quoted text, click to view]

Ross Presser
6/8/2005 9:55:33 AM
[quoted text, click to view]

Thanks for the DDL and INSERTs.

You'll have to use a self-join to number the rows, then another self-join
to filter them. Put the first self-join in a view for convenience:

CREATE View TestView AS

SELECT
x.CustomerID, x.ID, x.ItemCode, count(y.ID) RowNum
FROM
TestCountJoin x
INNER JOIN TestCountJoin y
ON x.CustomerID = y.CustomerID AND x.ItemCode=y.ItemCode
AND x.ID >= y.ID
GROUP BY
x.CustomerID, x.ID, x.ItemCode

GO

Now, the filtering is easy:


SELECT
a.CustomerID,
b.ID B_ID, b.ItemCode B_ItemCode,
a.ID A_ID, a.ItemCode A_ItemCode
FROM
TestView a, TestView b
WHERE
a.ItemCode = 'A' AND b.ItemCode = 'B'
AND a.CustomerID = b.CustomerID
AND a.RowNum = b.RowNum

CustomerID B_ID B_ItemCode A_ID A_ItemCode
----------- ----------- ---------- ----------- ----------
1 4 B 1 A
1 5 B 2 A
1 6 B 3 A
2 9 B 8 A

Brett Gerhardi
6/8/2005 10:52:50 AM
Can anyone work out how I can do this in a set-based way? I am totally
stumped on this one.

Given the following:

create table TestCountJoin
(
ID int identity(1,1) primary key,
CustomerID int,
ItemCode varchar(1)
)

insert into TestCountJoin (CustomerID, ItemCode) values (1, 'A')
insert into TestCountJoin (CustomerID, ItemCode) values (1, 'A')
insert into TestCountJoin (CustomerID, ItemCode) values (1, 'A')
insert into TestCountJoin (CustomerID, ItemCode) values (1, 'B')
insert into TestCountJoin (CustomerID, ItemCode) values (1, 'B')
insert into TestCountJoin (CustomerID, ItemCode) values (1, 'B')
insert into TestCountJoin (CustomerID, ItemCode) values (1, 'B')
insert into TestCountJoin (CustomerID, ItemCode) values (2, 'A')
insert into TestCountJoin (CustomerID, ItemCode) values (2, 'B')
insert into TestCountJoin (CustomerID, ItemCode) values (3, 'A')
insert into TestCountJoin (CustomerID, ItemCode) values (3, 'A')

I want to return all instances where ItemCode 'B' can be paired with an
ItemCode 'A'
for the same CustomerID (can be in PK order)

By "paired" I mean that each ItemCode B can only match a line with ItemCode
'A' once, then both lines are excluded from any further matching.

So if a single customer has 4 Bs and 3 A's, only the first 3 Bs should be
returned
from the query, as there is no 4th A to match

So the results from the above should be

CustomerID, B_ID, B_ItemCode, A_ID, A_ItemCode
1, 4, B, 1, A
1, 5, B, 2, A
1, 6, B, 3, A
2, 9, B, 8, A

3 pairs for customer 1, 1 pair for customer 2 and none from customer 3.

Thanks for any help
-=- Brett

Brett Gerhardi
6/9/2005 12:00:00 AM
I have tried it myself and Ross's joining technique is much more efficient
than my calculated field method. Thinking about it I'm not surprised.

In my real world scenario I also found putting the output from the
pre-rownum'ed into a table variable (with unique constraints for indexes),
then doing the self join on the table variable improved the performance
significantly.

Thanks again for the help.

Regards
-=- Brett

[quoted text, click to view]

Brett Gerhardi
6/9/2005 12:00:00 AM
My real world problem is far more complex than just this - this is the only
part of the problem that I didn't have a good solution to and was interested
in other peoples approaches to it.

Briefly, the real-world scenario is that I have the task of building part of
a discount system. The discount system has is able to automatically provide
a discount on service type sales lines based on other service type sales
lines.

So essentially if a customer has a service A we want to provide him a
discount if he has service B. However if the customer has 2 As but only 1 B,
we only want to give 1 A the discount.

So this is an secondary requirement away from the 'natural' purpose of the
service 'sales' system, which is to record orders for services and provision
them. There are further complexities such as start and end dates which
remove lines from being eligable, which makes the run-time match calculation
desirable.

Regards
-=- Brett

[quoted text, click to view]

AddThis Social Bookmark Button