sql server programming:
I have these two tables: Leads =============== id date account id 1 1/1/2005 1-abc 2 7/1/2005 1-abc 3 8/1/2005 1-xyz 4 8/15/2005 1-xyz Sales ================= id date account id 1 8/1/2005 1-abc 2 9/1/2005 1-abc 3 9/1/2005 1-xyz Result: ====================================== Lead id Sales id account id 1 1 1-abc 2 2 1-abc 3 3 1-xyz I need a store procedure to match the sales to leads (1-1 match). The matching is done by matching up the account id and the dates. The result only contains matched records. For example, Sales id 1 is matched to Lead id 1 because the account ids are the same and Sales 1 is the earliest. Sales id 2 is matched to lead id 2 because the account ids are the same and since Sales 1 is taken already, Sales id 2 is the earliest available one. Lead 4 does not have a match because sales 3 is taken already. In the result, sales id should be distinct. I need some help please. I am not sure how to do the "earliest available" matching. Can this be done without using cursor? Thanks in advance. auro88
Can you give the ddls, and insert scripts -- -Omnibuzz --------------------------------- Please post ddls and sample data for your queries and close the thread if you got the answer for your question. [quoted text, click to view] "auro88" wrote: > I have these two tables: > > Leads > =============== > id date account id > 1 1/1/2005 1-abc > 2 7/1/2005 1-abc > 3 8/1/2005 1-xyz > 4 8/15/2005 1-xyz > > Sales > ================= > id date account id > 1 8/1/2005 1-abc > 2 9/1/2005 1-abc > 3 9/1/2005 1-xyz > > > Result: > ====================================== > Lead id Sales id account id > 1 1 1-abc > 2 2 1-abc > 3 3 1-xyz > > I need a store procedure to match the sales to leads (1-1 match). The > matching is done by matching up the account id and the dates. The > result only contains matched records. > > For example, Sales id 1 is matched to Lead id 1 because the account ids > are the same and Sales 1 is the earliest. Sales id 2 is matched to lead > id 2 because the account ids are the same and since Sales 1 is taken > already, Sales id 2 is the earliest available one. Lead 4 does not have > a match because sales 3 is taken already. In the result, sales id > should be distinct. > > I need some help please. I am not sure how to do the "earliest > available" matching. Can this be done without using cursor? Thanks in > advance. > > auro88 >
is it SQL 2005 or an earlier version?
I do not have the ddl. Here is the insert script (Thanks for Ryan): set dateformat mdy declare @leads table (id int, date datetime, account_id varchar(10)) insert @leads (id, date, account_id) select 1, '1/1/2005', '1-abc' union all select 2, '7/1/2005', '1-abc' union all select 3, '8/1/2005', '1-xyz' union all select 4, '8/15/2005', '1-xyz' declare @sales table (id int, date datetime, account_id varchar(10)) insert @sales (id, date, account_id) select 1, '8/1/2005', '1-abc' union all select 2, '9/1/2005', '1-abc' union all select 3, '9/1/2005', '1-xyz'
Hi Alexander, Thanks for your reply. It's SQL 2000. [quoted text, click to view] Alexander Kuznetsov wrote: > is it SQL 2005 or an earlier version?
Sorry, new updates on the requirement. The dates between the leads and sales have to be within 6 months. Here is the updated sample of data and result. Thanks in advance Leads =============== id date account id 1 1/1/2005 1-abc 2 7/1/2005 1-abc 3 7/15/2005 1-abc 4 8/1/2005 1-xyz 5 8/15/2005 1-xyz Sales ================= id date account id 1 8/1/2005 1-abc 2 9/1/2005 1-abc 3 9/1/2005 1-xyz Result: ====================================== Lead id Sales id account id 2 1 1-abc 3 2 1-abc 3 3 1-xyz
[quoted text, click to view] On 28 Apr 2006 09:40:03 -0700, auro88 wrote: >Sorry, new updates on the requirement. The dates between the leads and >sales have to be within 6 months. Here is the updated sample of data >and result. Thanks in advance (snip) >Result: >====================================== >Lead id Sales id account id >2 1 1-abc >3 2 1-abc >3 3 1-xyz
Hi auro88, I think the last row shoould read 4 3 1-xyz. Here's the solution for SQL Server 2005: SELECT l.id AS lead_id, s.id AS sales_id, l.account_id FROM (SELECT id, date, account_id, RANK() OVER (PARTITION BY account_id ORDER BY date) AS r FROM @leads AS l1 WHERE EXISTS (SELECT * FROM @sales AS s1 WHERE s1.date < DATEADD(month, 6, l1.date))) AS l INNER JOIN (SELECT id, date, account_id, RANK() OVER (PARTITION BY account_id ORDER BY date) AS r FROM @sales) AS s ON l.account_id = s.account_id AND l.r = s.r And here's the equivalent for SQL Server 2000: SELECT l.id AS lead_id, s.id AS sales_id, l.account_id FROM (SELECT id, date, account_id, (SELECT COUNT(*) FROM @leads AS l2 WHERE EXISTS (SELECT * FROM @sales AS s2 WHERE s2.date < DATEADD(month, 6, l2.date)) AND l2.account_id = l1.account_id AND l2.date <= l1.date) AS r FROM @leads AS l1 WHERE EXISTS (SELECT * FROM @sales AS s1 WHERE s1.date < DATEADD(month, 6, l1.date))) AS l INNER JOIN (SELECT id, date, account_id, (SELECT COUNT(*) FROM @sales AS s4 WHERE s4.account_id = s3.account_id AND s4.date <= s3.date) AS r FROM @sales AS s3) AS s ON l.account_id = s.account_id AND l.r = s.r --
Thanks Hugo. Yes you were right about the last line. Could you explain why the EXISTS part exists in the l only and not in s? I found a bug and I think that's the reason. The bug is that a lead could map to a sales which occurs before the lead date. Try this sample data: set dateformat mdy declare @leads table (id int, date datetime, account_id varchar(10)) insert @leads (id, date, account_id) select 1, '1/1/2005', '1-abc' union all select 2, '7/1/2005', '1-abc' union all select 3, '7/15/2005', '1-abc' union all select 4, '8/1/2005', '1-xyz' declare @sales table (id int, date datetime, account_id varchar(10)) insert @sales (id, date, account_id) select 1, '8/1/2005', '1-abc' union all select 2, '9/1/2005', '1-abc' union all select 3, '7/1/2005', '1-xyz' union all select 4, '9/2/2005', '1-xyz' Thanks, auro88 [quoted text, click to view] Hugo Kornelis wrote: > On 28 Apr 2006 09:40:03 -0700, auro88 wrote: > > >Sorry, new updates on the requirement. The dates between the leads and > >sales have to be within 6 months. Here is the updated sample of data > >and result. Thanks in advance > (snip) > >Result: > >====================================== > >Lead id Sales id account id > >2 1 1-abc > >3 2 1-abc > >3 3 1-xyz > > Hi auro88, > > I think the last row shoould read 4 3 1-xyz. > > Here's the solution for SQL Server 2005: > > SELECT l.id AS lead_id, s.id AS sales_id, l.account_id > FROM (SELECT id, date, account_id, > RANK() OVER (PARTITION BY account_id > ORDER BY date) AS r > FROM @leads AS l1 > WHERE EXISTS > (SELECT * > FROM @sales AS s1 > WHERE s1.date < DATEADD(month, 6, l1.date))) AS l > INNER JOIN (SELECT id, date, account_id, > RANK() OVER (PARTITION BY account_id > ORDER BY date) AS r > FROM @sales) AS s > ON l.account_id = s.account_id > AND l.r = s.r > > And here's the equivalent for SQL Server 2000: > > SELECT l.id AS lead_id, s.id AS sales_id, l.account_id > FROM (SELECT id, date, account_id, > (SELECT COUNT(*) > FROM @leads AS l2 > WHERE EXISTS > (SELECT * > FROM @sales AS s2 > WHERE s2.date < DATEADD(month, 6, l2.date)) > AND l2.account_id = l1.account_id > AND l2.date <= l1.date) AS r > FROM @leads AS l1 > WHERE EXISTS > (SELECT * > FROM @sales AS s1 > WHERE s1.date < DATEADD(month, 6, l1.date))) AS l > INNER JOIN (SELECT id, date, account_id, > (SELECT COUNT(*) > FROM @sales AS s4 > WHERE s4.account_id = s3.account_id > AND s4.date <= s3.date) AS r > FROM @sales AS s3) AS s > ON l.account_id = s.account_id > AND l.r = s.r > > > -- > Hugo Kornelis, SQL Server MVP
Hi Hugo, Thanks for your reply. I know the "EXISTS" part is for the requirement that leads and sales have to be within 6 months. So I was asking how come the "EXISTS" part exists in the table l only and not in the table s. Yeah I'm sorry about the changing of the requirement. I did not state my new requirement clearly. When I said the date between leads and sales has to be within 6 months, I actually mean lead date must be earlier than sales date and have to be within 6 months (Sales date - Lead date <= 6 months). Thanks again Hugo :) auro88
[quoted text, click to view] On 29 Apr 2006 11:14:25 -0700, auro88 wrote: >Thanks Hugo. Yes you were right about the last line. > >Could you explain why the EXISTS part exists in the l only and not in >s?
Hi auro88, That's becuase of the requirement you added after posting the original problem. Leads and sales have to be within 6 months, so any lead for which no sale exists in the 6 months following the lead date is excluded. [quoted text, click to view] > I found a bug and I think that's the reason. The bug is that a >lead could map to a sales which occurs before the lead date.
This isn't even a bug - you have just changed the requirements a second time. Since I now really no longer know what the exact requirements are, I suggest that you first take some time to sit down and work out the exact requirements for matching. Consider all possible (and a few impossible) combinations of sales and leads, and how you'd like the output to be in those cases. Right now, we're all just shooting on a moving target. I'm afraind that if I come up with a query to fix the sample data you posted this time, you'll find a new issue as soon as you test it on your real database. So instead of going on like this, lets just do it right. Oh, and another thing. I have the feeling that your database design is broken. It appears as if you're trying to repair a relationship that soould have been explicitly stored in the database to begin with (the relationship between a lead and a sale: "sale so and so is a result of lead such and such"). If you're wise, the very next thing you do after sorting out the mess is improving the design. (First mop the floor, then fix the leak). BTW, I'll be without Internet access for the next days. I'll check back on this thread when I'm at my computer again, but maybe someone else will have helped you first. --
Hi Hugo, I think I modified the query as you suggested, ran it against slightly different data, and got confusing results: set dateformat mdy declare @leads table (id int, date datetime, account_id varchar(10)) insert @leads (id, date, account_id) select 1, '1/1/2005', '1-abc' union all select 2, '7/1/2005', '1-abc' union all select 3, '7/15/2005', '1-abc' union all select 4, '8/1/2005', '1-xyz' declare @sales table (id int, date datetime, account_id varchar(10)) insert @sales (id, date, account_id) select 1, '7/10/2005', '1-abc' union all select 2, '7/11/2005', '1-abc' union all select 3, '7/12/2005', '1-abc' union all select 4, '8/1/2005', '1-abc' union all select 5, '9/1/2005', '1-abc' union all select 6, '7/1/2005', '1-xyz' union all select 7, '9/2/2005', '1-xyz' SELECT l.id AS lead_id, l.date as lead_date, s.id AS sales_id, s.date as sale_date, l.account_id FROM (SELECT id, date, account_id, (SELECT COUNT(*) FROM @leads AS l2 WHERE EXISTS (SELECT * FROM @sales AS s2 WHERE s2.date < DATEADD(month, 6, l2.date) and s2.date>=l2.date) AND l2.account_id = l1.account_id AND l2.date <= l1.date) AS r FROM @leads AS l1 WHERE EXISTS (SELECT * FROM @sales AS s1 WHERE s1.date < DATEADD(month, 6, l1.date) and s1.date>=l1.date)) AS l INNER JOIN (SELECT id, date, account_id, (SELECT COUNT(*) FROM @sales AS s4 WHERE s4.account_id = s3.account_id AND s4.date <= s3.date) AS r FROM @sales AS s3) AS s ON l.account_id = s.account_id AND l.r = s.r lead_id lead_date sales_id sale_date account_id ----------- ------------------------------------------------------ ----------- ------------------------------------------------------ ---------- 2 2005-07-01 00:00:00.000 1 2005-07-10 00:00:00.000 1-abc 3 2005-07-15 00:00:00.000 2 2005-07-11 00:00:00.000 1-abc 4 2005-08-01 00:00:00.000 6 2005-07-01 00:00:00.000 1-xyz (3 row(s) affected) In fact, I think that this problem requires a recursive solution. What do you think?
[quoted text, click to view] On 29 Apr 2006 19:47:37 -0700, auro88 wrote: >Hi Hugo, > >Thanks for your reply. I know the "EXISTS" part is for the requirement >that leads and sales have to be within 6 months. So I was asking how >come the "EXISTS" part exists in the table l only and not in the table >s.
Hi auro88, For leads, the additional requirement is that they are only listed if there's a sale within 6 months. The EXISTS is used to remove leads that don't fit this requirement. There is no such requirement for sales, hence no EXISTS is needed to filter the sales table. [quoted text, click to view] > >Yeah I'm sorry about the changing of the requirement. I did not state >my new requirement clearly. When I said the date between leads and >sales has to be within 6 months, I actually mean lead date must be >earlier than sales date and have to be within 6 months (Sales date - >Lead date <= 6 months).
In that case, change WHERE s2.date < DATEADD(month, 6, l2.date) to WHERE s2.date >= l2.date AND s2.date < DATEADD(month, 6, l2.date) (And similar for s1.date vs l1.date) [quoted text, click to view] > >Thanks again Hugo :)
You're welcome! :-) --
Hi Hugo, The query that I posted associated the lead on 7/15/2005 (lead id = 3) with the sale on 7/11/2005 (sale id = 2). Basicallly I took your original query and tried to modify it as you suggested: " change WHERE s2.date < DATEADD(month, 6, l2.date) to WHERE s2.date >= l2.date AND s2.date < DATEADD(month, 6, l2.date) (And similar for s1.date vs l1.date)" That's how I came up with the query. [quoted text, click to view] Here is my concern: auro88 wrote:
"The bug is that a lead could map to a sales which occurs before the lead date". From my own experience and from auro88's posting I am assuming that the sale date cannot be earlier that the corresponding lead date. Yet the query's output failed me meet this requirement. Please correct me if I am wrong
[quoted text, click to view] On 8 May 2006 15:13:24 -0700, Alexander Kuznetsov wrote: >Hi Hugo, > >I think I modified the query as you suggested, ran it against slightly >different data, and got confusing results:
(snip) Hi Alexander, Why do you think that these results are confusing? They satisfy exactly the requirements that auro88 has thus far put forth in this thread: - Discard leads if there's no sale for the same account-id within 6 months; - Don't discard sales; - Match earliest lead to earliest sale, next to earliest lead to next to earliest sale, etc. Maybe you have a different understanding of auro88's requirements? [quoted text, click to view] >In fact, I think that this problem requires a recursive solution. > >What do you think?
If my understanding of auro88's problem is correct, then it doesn't need a recursive solution. If your understanding of the problem is correct, than I first need to know what part of the requirements I'm misunderstanding before I can comment. --
[quoted text, click to view] On 9 May 2006 17:40:31 -0700, Alexander Kuznetsov wrote: >Hi Hugo, (snip) >Here is my concern: auro88 wrote: >"The bug is that a >lead could map to a sales which occurs before the lead date". From my >own experience and from auro88's posting I am assuming that the sale >date cannot be earlier that the corresponding lead date. Yet the >query's output failed me meet this requirement.
Hi Alexander, Okay, I have apparently missed that requirement. This changes things. I don't think that this problem can be solved with a good performing, scalable set-based query. Unless someone more versed in SQL Server 2005 than me comes up with a smashingly clever recursive CTE-based idea, the best option is probably to set up two cursors (one for each of the tables) and use the well-proven balanced line algorithm to procss the data. Just like in the old days <g>. --
Don't see what you're looking for? Try a search.
|