all groups > sql server data mining > march 2005 >
You're in the

sql server data mining

group:

SQL: Return the intersection of n-n relationship



SQL: Return the intersection of n-n relationship Flip
3/22/2005 9:23:05 AM
sql server data mining: Hi,

This looks simple but I can't figure it out.

If there's a n-n relationship between two tables, how can I find all records
in the first table that are linked to all records of the second table.

For Example:

TableComputers

ComputerID, Name
1, PC1
2, PC2
3, Server3

TableSoftware

SoftwareID, Name
1, Outlook
2, The Sims
3, Notepad

TableSoftwareOnComputers

ComputerID, SoftwareID
1, 1
1, 3
2, 1
2, 2,
2, 3
3, 3

What I'm looking for is the query that will return ComputerID 2: the only
computer that is running all available software. Or is something wrong with
my db structure if I need to look for this?

Many thnx in advance,
flip
Re: Return the intersection of n-n relationship Flip
3/22/2005 12:27:03 PM
It get "This site is under construction and coming soon" when surfing to
www.dbazine.com.

Any chance you can explain in short what the article said?

Thanks anyway for your reply,
Filip


[quoted text, click to view]
Re: Return the intersection of n-n relationship Adam Machanic
3/22/2005 12:39:33 PM
See if this article helps you:

http://www.dbazine.com/celko1.html

--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--


[quoted text, click to view]

Re: Return the intersection of n-n relationship Adam Machanic
3/22/2005 3:33:22 PM
[quoted text, click to view]

Odd -- it works for me... Anyway, I'll just paste it:


Relational Division
by Joe Celko

Dr. Codd defined a set of eight basic operators for his relational model.
This series of articles looks at those basic operators in Standard SQL. Some
are implemented directly, some require particular programming tricks and all
of them have to be slightly modified to fit into the SQL language model.

Relational division is one of the eight basic operations in Codd's
relational algebra. The idea is that a divisor table is used to partition a
dividend table and produce a quotient or results table. The quotient table
is made up of those values of one column for which a second column had all
of the values in the divisor.

This is easier to explain with an example. We have a table of pilots and the
planes they can fly (dividend); we have a table of planes in the hangar
(divisor); we want the names of the pilots who can fly every plane
(quotient) in the hangar. To get this result, we divide the PilotSkills
table by the planes in the hangar.

CREATE TABLE PilotSkills
(pilot CHAR(15) NOT NULL,
plane CHAR(15) NOT NULL,
PRIMARY KEY (pilot, plane));

PilotSkills
pilot plane
=========================
'Celko' 'Piper Cub'
'Higgins' 'B-52 Bomber'
'Higgins' 'F-14 Fighter'
'Higgins' 'Piper Cub'
'Jones' 'B-52 Bomber'
'Jones' 'F-14 Fighter'
'Smith' 'B-1 Bomber'
'Smith' 'B-52 Bomber'
'Smith' 'F-14 Fighter'
'Wilson' 'B-1 Bomber'
'Wilson' 'B-52 Bomber'
'Wilson' 'F-14 Fighter'
'Wilson' 'F-17 Fighter'

CREATE TABLE Hangar
(plane CHAR(15) NOT NULL PRIMARY KEY);

Hangar
plane
=============
'B-1 Bomber'
'B-52 Bomber'
'F-14 Fighter'

PilotSkills DIVIDED BY Hangar
pilot
=============================
'Smith'
'Wilson'
In this example, Smith and Wilson are the two pilots who can fly everything
in the hangar. Notice that Higgins and Celko know how to fly a Piper Cub,
but we don't have one right now. In Codd's original definition of relational
division, having more rows than are called for is not a problem.

The important characteristic of a relational division is that the CROSS JOIN
(Cartesian product) of the divisor and the quotient produces a valid subset
of rows from the dividend. This is where the name comes from, since the
CROSS JOIN acts like a multiplication operator.

Relational division can be written as a single query, thus:

SELECT DISTINCT pilot
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot = PS2.pilot)
AND (PS2.plane = Hangar.plane)));
The quickest way to explain what is happening in this query is to imagine an
old World War II movie where a cocky pilot has just walked into the hangar,
looked over the fleet, and announced, "There ain't no plane in this hangar
that I can't fly!", which is good logic, but horrible English.

We are finding the pilots for whom there does not exist a plane in the
hangar for which they have no skills. The use of the NOT EXISTS() predicates
is for speed. Most SQL systems will look up a value in an index rather than
scan the whole table. This query for relational division was made popular by
Chris Date in his textbooks, but it is not the only method, nor always the
fastest. Another version of the division can be written so as to avoid three
levels of nesting. While it is not original with me, I have made it popular
in my books.

SELECT PS1.pilot
FROM PilotSkills AS PS1, Hangar AS H1
WHERE PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar);There is a
serious difference in the two methods. Burn down the hangar, so that the
divisor is empty. Because of the NOT EXISTS() predicates in Date's query,
all pilots are returned from a division by an empty set. Because of the
COUNT() functions in my query, no pilots are returned from a division by
an empty set.

In the sixth edition of his book, Introduction to Database Systems, Chris
Date defined another operator (DIVIDEBY ... PER) which produces the same
results as my query, but with more complexity.

Another kind of relational division is exact relational division. The
dividend table must match exactly to the values of the divisor without any
extra values.

SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);
This says that a pilot must have the same number of certificates as there
planes in the hangar and these certificates all match to a plane in the
hangar, not something else. The "something else" is shown by a created NULL
from the LEFT OUTER JOIN.

Please do not make the mistake of trying to reduce the HAVING clause with a
little algebra to:

HAVING COUNT(PS1.plane) = COUNT(H1.plane) because it does not work; it will
tell you that the hangar has (n) planes in it and the pilot is certified for
(n) planes, but not that those two sets of planes are equal to each other.

The Winter 1996 edition of DB2 On-Line Magazine had an article entitled
"Powerful SQL: Beyond the Basics" by Sheryl Larsen that gave the results of
testing both methods. Her conclusion for DB2 was that the nested EXISTS()
version is better when the quotient has less than 25% of the dividend
table's rows and the COUNT(*) version is better when the quotient is more
than 25% of the dividend table.

---

Joe Celko was a member of the ANSI X3H2 Database Standards Committee and
helped write the SQL-92 standards. He is the author of over 450 magazine col
umns and four books, the best known of which is SQL for Smarties
(Morgan-Kaufmann Publishers, 1999). He is the Vice President of RDBMS at
North Face Learning in Salt Lake City.





--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--

Re: Return the intersection of n-n relationship Flip
3/22/2005 5:33:02 PM
Hmmm. Looks like it wasn't that simple after all.

For what it's worth, I figured out a solution based on COUNT() myself:

SELECT DISTINCT ps.Pilot
FROM PilotSkills ps
WHERE (SELECT COUNT(*) FROM #PilotSkills WHERE Pilot = ps.Pilot) >= (SELECT
COUNT(Plane) FROM #Hangar)

This only works, though, when referential integrity is enforced between the
two tables. (In the case of the example in this article, the pilots aren't
allowed to fly any planes that aren't in the hangar ;) My query returns
Higgins as well as Smith and Wilson.

Thank you very much for your reply. This has put me on the right track again.

flip


[quoted text, click to view]
AddThis Social Bookmark Button