all groups > sql server programming > april 2006 >
You're in the

sql server programming

group:

Please help on this matching problem


Please help on this matching problem auro88
4/27/2006 9:11:08 PM
sql server programming: I have these two tables:

Leads
===============
id date account id
1 1/1/2005 1-abc
2 7/1/2005 1-abc
3 8/1/2005 1-xyz
4 8/15/2005 1-xyz

Sales
=================
id date account id
1 8/1/2005 1-abc
2 9/1/2005 1-abc
3 9/1/2005 1-xyz


Result:
======================================
Lead id Sales id account id
1 1 1-abc
2 2 1-abc
3 3 1-xyz

I need a store procedure to match the sales to leads (1-1 match). The
matching is done by matching up the account id and the dates. The
result only contains matched records.

For example, Sales id 1 is matched to Lead id 1 because the account ids
are the same and Sales 1 is the earliest. Sales id 2 is matched to lead
id 2 because the account ids are the same and since Sales 1 is taken
already, Sales id 2 is the earliest available one. Lead 4 does not have
a match because sales 3 is taken already. In the result, sales id
should be distinct.

I need some help please. I am not sure how to do the "earliest
available" matching. Can this be done without using cursor? Thanks in
advance.

auro88
RE: Please help on this matching problem Omnibuzz
4/27/2006 10:34:02 PM
Can you give the ddls, and insert scripts
--
-Omnibuzz
---------------------------------
Please post ddls and sample data for your queries and close the thread if
you got the answer for your question.


[quoted text, click to view]
Re: Please help on this matching problem Alexander Kuznetsov
4/28/2006 7:13:02 AM
is it SQL 2005 or an earlier version?
Re: Please help on this matching problem auro88
4/28/2006 7:47:10 AM
I do not have the ddl. Here is the insert script (Thanks for Ryan):

set dateformat mdy
declare @leads table (id int, date datetime, account_id varchar(10))
insert @leads (id, date, account_id)
select 1, '1/1/2005', '1-abc'
union all select 2, '7/1/2005', '1-abc'
union all select 3, '8/1/2005', '1-xyz'
union all select 4, '8/15/2005', '1-xyz'

declare @sales table (id int, date datetime, account_id varchar(10))
insert @sales (id, date, account_id)
select 1, '8/1/2005', '1-abc'
union all select 2, '9/1/2005', '1-abc'
union all select 3, '9/1/2005', '1-xyz'
Re: Please help on this matching problem auro88
4/28/2006 7:48:16 AM
Hi Alexander,

Thanks for your reply. It's SQL 2000.

[quoted text, click to view]
Re: Please help on this matching problem auro88
4/28/2006 9:40:03 AM
Sorry, new updates on the requirement. The dates between the leads and
sales have to be within 6 months. Here is the updated sample of data
and result. Thanks in advance

Leads
===============
id date account id
1 1/1/2005 1-abc
2 7/1/2005 1-abc
3 7/15/2005 1-abc
4 8/1/2005 1-xyz
5 8/15/2005 1-xyz

Sales
=================
id date account id
1 8/1/2005 1-abc
2 9/1/2005 1-abc
3 9/1/2005 1-xyz


Result:
======================================
Lead id Sales id account id
2 1 1-abc
3 2 1-abc
3 3 1-xyz
Re: Please help on this matching problem Hugo Kornelis
4/29/2006 12:30:29 AM
[quoted text, click to view]

Hi auro88,

I think the last row shoould read 4 3 1-xyz.

Here's the solution for SQL Server 2005:

SELECT l.id AS lead_id, s.id AS sales_id, l.account_id
FROM (SELECT id, date, account_id,
RANK() OVER (PARTITION BY account_id
ORDER BY date) AS r
FROM @leads AS l1
WHERE EXISTS
(SELECT *
FROM @sales AS s1
WHERE s1.date < DATEADD(month, 6, l1.date))) AS l
INNER JOIN (SELECT id, date, account_id,
RANK() OVER (PARTITION BY account_id
ORDER BY date) AS r
FROM @sales) AS s
ON l.account_id = s.account_id
AND l.r = s.r

And here's the equivalent for SQL Server 2000:

SELECT l.id AS lead_id, s.id AS sales_id, l.account_id
FROM (SELECT id, date, account_id,
(SELECT COUNT(*)
FROM @leads AS l2
WHERE EXISTS
(SELECT *
FROM @sales AS s2
WHERE s2.date < DATEADD(month, 6, l2.date))
AND l2.account_id = l1.account_id
AND l2.date <= l1.date) AS r
FROM @leads AS l1
WHERE EXISTS
(SELECT *
FROM @sales AS s1
WHERE s1.date < DATEADD(month, 6, l1.date))) AS l
INNER JOIN (SELECT id, date, account_id,
(SELECT COUNT(*)
FROM @sales AS s4
WHERE s4.account_id = s3.account_id
AND s4.date <= s3.date) AS r
FROM @sales AS s3) AS s
ON l.account_id = s.account_id
AND l.r = s.r


--
Re: Please help on this matching problem auro88
4/29/2006 11:14:25 AM
Thanks Hugo. Yes you were right about the last line.

Could you explain why the EXISTS part exists in the l only and not in
s? I found a bug and I think that's the reason. The bug is that a
lead could map to a sales which occurs before the lead date.

Try this sample data:
set dateformat mdy
declare @leads table (id int, date datetime, account_id varchar(10))
insert @leads (id, date, account_id)
select 1, '1/1/2005', '1-abc'
union all select 2, '7/1/2005', '1-abc'
union all select 3, '7/15/2005', '1-abc'
union all select 4, '8/1/2005', '1-xyz'

declare @sales table (id int, date datetime, account_id varchar(10))
insert @sales (id, date, account_id)
select 1, '8/1/2005', '1-abc'
union all select 2, '9/1/2005', '1-abc'
union all select 3, '7/1/2005', '1-xyz'
union all select 4, '9/2/2005', '1-xyz'

Thanks,
auro88

[quoted text, click to view]
Re: Please help on this matching problem auro88
4/29/2006 7:47:37 PM
Hi Hugo,

Thanks for your reply. I know the "EXISTS" part is for the requirement
that leads and sales have to be within 6 months. So I was asking how
come the "EXISTS" part exists in the table l only and not in the table
s.

Yeah I'm sorry about the changing of the requirement. I did not state
my new requirement clearly. When I said the date between leads and
sales has to be within 6 months, I actually mean lead date must be
earlier than sales date and have to be within 6 months (Sales date -
Lead date <= 6 months).

Thanks again Hugo :)

auro88
Re: Please help on this matching problem Hugo Kornelis
4/29/2006 10:32:19 PM
[quoted text, click to view]

Hi auro88,

That's becuase of the requirement you added after posting the original
problem. Leads and sales have to be within 6 months, so any lead for
which no sale exists in the 6 months following the lead date is
excluded.

[quoted text, click to view]

This isn't even a bug - you have just changed the requirements a second
time.

Since I now really no longer know what the exact requirements are, I
suggest that you first take some time to sit down and work out the exact
requirements for matching. Consider all possible (and a few impossible)
combinations of sales and leads, and how you'd like the output to be in
those cases. Right now, we're all just shooting on a moving target. I'm
afraind that if I come up with a query to fix the sample data you posted
this time, you'll find a new issue as soon as you test it on your real
database. So instead of going on like this, lets just do it right.

Oh, and another thing. I have the feeling that your database design is
broken. It appears as if you're trying to repair a relationship that
soould have been explicitly stored in the database to begin with (the
relationship between a lead and a sale: "sale so and so is a result of
lead such and such"). If you're wise, the very next thing you do after
sorting out the mess is improving the design. (First mop the floor, then
fix the leak).

BTW, I'll be without Internet access for the next days. I'll check back
on this thread when I'm at my computer again, but maybe someone else
will have helped you first.

--
Re: Please help on this matching problem Alexander Kuznetsov
5/8/2006 3:13:24 PM
Hi Hugo,

I think I modified the query as you suggested, ran it against slightly
different data, and got confusing results:

set dateformat mdy
declare @leads table (id int, date datetime, account_id varchar(10))
insert @leads (id, date, account_id)
select 1, '1/1/2005', '1-abc'
union all select 2, '7/1/2005', '1-abc'
union all select 3, '7/15/2005', '1-abc'
union all select 4, '8/1/2005', '1-xyz'


declare @sales table (id int, date datetime, account_id varchar(10))
insert @sales (id, date, account_id)
select 1, '7/10/2005', '1-abc'
union all select 2, '7/11/2005', '1-abc'
union all select 3, '7/12/2005', '1-abc'
union all select 4, '8/1/2005', '1-abc'
union all select 5, '9/1/2005', '1-abc'
union all select 6, '7/1/2005', '1-xyz'
union all select 7, '9/2/2005', '1-xyz'

SELECT l.id AS lead_id, l.date as lead_date, s.id AS sales_id,
s.date as sale_date, l.account_id
FROM (SELECT id, date, account_id,
(SELECT COUNT(*)
FROM @leads AS l2
WHERE EXISTS
(SELECT *
FROM @sales AS s2
WHERE s2.date < DATEADD(month, 6, l2.date) and
s2.date>=l2.date)
AND l2.account_id = l1.account_id
AND l2.date <= l1.date) AS r
FROM @leads AS l1
WHERE EXISTS
(SELECT *
FROM @sales AS s1
WHERE s1.date < DATEADD(month, 6, l1.date) and
s1.date>=l1.date)) AS l
INNER JOIN (SELECT id, date, account_id,
(SELECT COUNT(*)
FROM @sales AS s4
WHERE s4.account_id = s3.account_id
AND s4.date <= s3.date) AS r
FROM @sales AS s3) AS s
ON l.account_id = s.account_id
AND l.r = s.r

lead_id lead_date
sales_id sale_date
account_id
----------- ------------------------------------------------------
----------- ------------------------------------------------------
----------
2 2005-07-01 00:00:00.000 1
2005-07-10 00:00:00.000 1-abc
3 2005-07-15 00:00:00.000 2
2005-07-11 00:00:00.000 1-abc
4 2005-08-01 00:00:00.000 6
2005-07-01 00:00:00.000 1-xyz

(3 row(s) affected)

In fact, I think that this problem requires a recursive solution.

What do you think?
Re: Please help on this matching problem Hugo Kornelis
5/8/2006 10:50:01 PM
[quoted text, click to view]

Hi auro88,

For leads, the additional requirement is that they are only listed if
there's a sale within 6 months. The EXISTS is used to remove leads that
don't fit this requirement.

There is no such requirement for sales, hence no EXISTS is needed to
filter the sales table.

[quoted text, click to view]

In that case, change

WHERE s2.date < DATEADD(month, 6, l2.date)

to

WHERE s2.date >= l2.date
AND s2.date < DATEADD(month, 6, l2.date)

(And similar for s1.date vs l1.date)

[quoted text, click to view]

You're welcome! :-)

--
Re: Please help on this matching problem Alexander Kuznetsov
5/9/2006 5:40:31 PM
Hi Hugo,

The query that I posted associated the lead on 7/15/2005 (lead id = 3)
with the sale on 7/11/2005 (sale id = 2). Basicallly I took your
original query and tried to modify it as you suggested: " change


WHERE s2.date < DATEADD(month, 6, l2.date)


to


WHERE s2.date >= l2.date
AND s2.date < DATEADD(month, 6, l2.date)


(And similar for s1.date vs l1.date)"

That's how I came up with the query.

[quoted text, click to view]
"The bug is that a
lead could map to a sales which occurs before the lead date". From my
own experience and from auro88's posting I am assuming that the sale
date cannot be earlier that the corresponding lead date. Yet the
query's output failed me meet this requirement.

Please correct me if I am wrong
Re: Please help on this matching problem Hugo Kornelis
5/10/2006 12:27:43 AM
[quoted text, click to view]
(snip)

Hi Alexander,

Why do you think that these results are confusing? They satisfy exactly
the requirements that auro88 has thus far put forth in this thread:

- Discard leads if there's no sale for the same account-id within 6
months;
- Don't discard sales;
- Match earliest lead to earliest sale, next to earliest lead to next to
earliest sale, etc.

Maybe you have a different understanding of auro88's requirements?

[quoted text, click to view]

If my understanding of auro88's problem is correct, then it doesn't need
a recursive solution.

If your understanding of the problem is correct, than I first need to
know what part of the requirements I'm misunderstanding before I can
comment.

--
Re: Please help on this matching problem Hugo Kornelis
5/10/2006 11:23:25 PM
[quoted text, click to view]

Hi Alexander,

Okay, I have apparently missed that requirement. This changes things.

I don't think that this problem can be solved with a good performing,
scalable set-based query. Unless someone more versed in SQL Server 2005
than me comes up with a smashingly clever recursive CTE-based idea, the
best option is probably to set up two cursors (one for each of the
tables) and use the well-proven balanced line algorithm to procss the
data. Just like in the old days <g>.

--
AddThis Social Bookmark Button