Groups | Blog | Home
all groups > sql server (alternate) > march 2004 >

sql server (alternate) : Simple SELECT question


Anita
3/12/2004 1:27:59 PM
Hi All,

I have a table this :

CREATE TABLE Tbl
( OrderId INT,
Product CHAR(1),
Price INT )

INSERT INTO Tbl VALUES(1,'A', 2)
INSERT INTO Tbl VALUES(1,'B', 4)
INSERT INTO Tbl VALUES(1,'D', 2)
INSERT INTO Tbl VALUES(2,'A', 1)
INSERT INTO Tbl VALUES(2,'B', 1)
INSERT INTO Tbl VALUES(2,'C', 1)
INSERT INTO Tbl VALUES(3,'B', 1)
INSERT INTO Tbl VALUES(3,'D', 1)

From this table, I want to make a list of minimum product price. The
table may contain duplicate values and all
columns should be displayed.

For example :
OrderId Product Price
2 A 1
2 B 1
2 C 1
3 D 1

I have tried using 2 commands below, but I am not sure if these are good
or not. I am very limited in understanding
the execution plan.

Could any one show me the weakness of my commands
and give me other similar commands that will run faster ?
Note : I will use the command on a very large table.

Thanks in advance
Anita Hery

------------
SELECT Tbl.OrderId, Tbl.Product, Tbl.Price
FROM Tbl
INNER JOIN
( SELECT Product, MIN(Price) AS P
FROM Tbl
GROUP BY Product
) AS T
ON Tbl.Product = T.Product and
Tbl.Price = P
WHERE Tbl.OrderId =
( SELECT TOP 1 OrderId
FROM Tbl
WHERE Product = T.Product and
Price = P
ORDER BY OrderId)


SELECT * FROM Tbl
WHERE 1 = ( select top 1 case when product = tbl.product
and price = tbl.price and
orderid = tbl.orderid then 1
else 0 end
from tbl t
where product = tbl.product
order by product,price,orderid )




*** Sent via Developersdex http://www.developersdex.com ***
David Portas
3/12/2004 3:38:44 PM
Here's another alternative for the lowest numbered Order with the lowest
Price for each Product:

SELECT MIN(orderid) AS orderid,
product, price
FROM Tbl AS T
WHERE price =
(SELECT MIN(price)
FROM Tbl
WHERE product = T.product)
GROUP BY product, price

--
David Portas
SQL Server MVP
--

Joe Celko
3/12/2004 7:27:59 PM
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Is this what you meant to say?

CREATE TABLE Orders
(order_id INTEGER NOT NULL,
product CHAR(1)NOT NULL,
price INTEGER NOT NULL,
PRIMARY KEY(order_id, product));

[quoted text, click to view]
table may contain duplicate values and all columns should be displayed.
<<

SELECT O1.*
FROM Orders AS O1
WHERE price
= (SELECT MIN(O2.price)
FROM Orders AS O2
WHERE O1.product = O2.product);

[quoted text, click to view]
OrderId Product Price
2 A 1
2 B 1
2 C 1
3 D 1 <<

Your example does not match your specifications: You did not show that
order #2 and #3 both have product 'B' priced at 1 unit in the sample
data.

Did you want the lowest order number; notice that I had to define a key
to guarantee uniqueness so that I could even begin to make this
assumption about the specifications. This is why we ask for DDL.

SELECT O1.*
FROM Orders AS O1
WHERE O1.order_id
= (SELECT MIN(O2.order_id)
FROM Orders AS O2
WHERE O2.product = O1.product
AND O2.price
= (SELECT MIN(O3.price)
FROM Orders AS O3
WHERE O1.product = O3.product));

The pattern can be extended in an obvious way. In English this says: "I
want the smallest order # with the smallest price for each product."

--CELKO--


*** Sent via Developersdex http://www.developersdex.com ***
Anita
3/12/2004 11:18:00 PM
Hi Joe,

Sorry, if my question is not clear.

The table structure is like what I show in in my post.
NULL values are controled by my application program.
In this case, I do not need primary key.
The table may contain duplicate values. I mean that
one order data may cantain several producs and
the same product may appear more than once.
For example :

OrderId Product Price
9 A 10
9 B 10
9 A 4
9 D 100
9 D 55

My two commands has been tested and produced output
like what I displayed in my post :

OrderId Product Price
2 A 1
2 B 1
2 C 1
3 D 1

This is the ouput I need. If there are 2 products
that have the same minimum values then I need only
one product. Either one is ok.

Anita Hery


*** Sent via Developersdex http://www.developersdex.com ***
David Portas
3/12/2004 11:52:51 PM
Anita,

[quoted text, click to view]

I'm not certain what you mean by this but a principle of database design is
that data integrity is maintained by the database, not the application.
Keys, nullability, datatypes and constraints are important and should always
be declared appropriately in your schema.

[quoted text, click to view]

Every table needs a primary key. A key is essential to the integrity of your
data. Some queries may be impossible, or at least extremely difficult,
without keys. If you have a business requirement that a single product on an
order can have two different prices then presumably the key will be
(orderid,product,price) or maybe (orderid,product,pricecode). Read-up on
Normalisation if you need to and verify that your data is in Third Normal
Form. Get the design correct now and you'll find it much easier to solve
other requirements later.

BTW, did my query solution help with your original question?

--
David Portas
SQL Server MVP
--

Anita
3/13/2004 2:27:59 AM
David,

Thanks for your reply

Your query produces the output I need.
The executon plan shows that your query has only one
table scan. But, my query has longer process path with
two table scans. I think your query works faster
than mine.

But I am thinking how to replace the first line
(SELECT MIN(orderid)..) to be: SELECT * ..

Because I want to show all columns. My actual table
has more than 25 columns.
Sorry, I show only 3 columns in my sample which
cause you choose SELECT MIN(orderid) on the first
line.

Anita Hery





*** Sent via Developersdex http://www.developersdex.com ***
David Portas
3/13/2004 4:11:41 PM
This is where we really do need to have a primary key. I'll assume that the
undeclared key will be (orderid,product,price) - if not then this may not
give the result you want (you haven't specified your requirement if all
three of those columns are duplicated or if any of them are NULL).

SELECT T1.orderid, T1.product, T1.price, ...
FROM Tbl AS T1
JOIN
(SELECT MIN(orderid) AS orderid,
product, price
FROM Tbl AS T
WHERE price =
(SELECT MIN(price)
FROM Tbl
WHERE product = T.product)
GROUP BY product, price) AS T2
ON T1.orderid = T2.orderid
AND T1.product = T2.product
AND T1.price = T2.price

--
David Portas
SQL Server MVP
--

Anita
3/14/2004 6:27:57 AM
Hi David,

Your replies are very helpful.

I am interested in comparing your query and mine.
From my test, using my actual table with 1 million
rows, I got :
1. Your query #1 took 8 to 12 seconds
2. Your query #2 took 19 to 20 seconds
3. My qeury #2 took 59 minutes 40 seconds

Now I get the answer to my original question.
My query is very bad and I must avoid using the
query like it. Although it does not have a long
executon plan path, but I still do not understand
which parts of the query that cause it takes
a very long time. Could you help me ?

Anita Hery


-------My query #2---------
SELECT * FROM Tbl
WHERE 1 = ( select top 1 case when product = tbl.product
and price = tbl.price and
orderid = tbl.orderid then 1
else 0 end
from tbl t
where product = tbl.product
order by product,price,orderid )

---Your query #1-----
SELECT MIN(orderid) AS orderid,
product, price
FROM Tbl AS T
WHERE price =
(SELECT MIN(price)
FROM Tbl
WHERE product = T.product)
GROUP BY product, price

---Your query #2-----
SELECT T1.orderid, T1.product, T1.price, ...
FROM Tbl AS T1
JOIN
(SELECT MIN(orderid) AS orderid,
product, price
FROM Tbl AS T
WHERE price =
(SELECT MIN(price)
FROM Tbl
WHERE product = T.product)
GROUP BY product, price) AS T2
ON T1.orderid = T2.orderid
AND T1.product = T2.product
AND T1.price = T2.price





*** Sent via Developersdex http://www.developersdex.com ***
Hugo Kornelis
3/15/2004 12:43:20 PM
[quoted text, click to view]

I'm far from calling myself an expert on this kind of things, but in
this case I think I can take a shot.

First of all: the length of an execution plan is not related to the
duration of query execution. It's the contents. In general: table
scans and sorting on large tables are bad, index scans are good.

In practice, this is very complicated stuff. I hardly ever try to
understand the execution plan. If I have to write queries that are
both complicated and either time-critical or operating on lots of data
(or both!), I always try a few different approaches to the query to
see which yields the best results.


[quoted text, click to view]

The inner query references the outer query. This is called a
correlated subquery. With just the where-condition, I guess that SQL
Server's optimizer would have changed it into an equivalent simple
self-join of the table, but I'd guess that the order and (especially)
the case preclude such action.

As it is, SQL Server will have to travers the outer query row by row;
for each row, the inner query has to be resolved. I don't know if the
inner query will produce a small or large result set; that would
depend on the number iof distinct products in the table. But the case
demands some logic to be performed on each row and the order by calls
for a sort of all the rows in the inner query's result set. My guess
is that this sort is the true bottleneck.

You already wrote that your actual table is 1,000,000 rows. Let's
assume a total of 100 different products; for the average product, the
inner query would yield a result set of 10,000 rows. to be sorted. If
the query optimizer recognises that the inner query would be the same
for all "outer" rows relating to the same product, it might actually
decide to execute the inner query only once per product. That would
let you get away with "only" 10,000 sorts of 10,000 rows. But you
might just as well get a whole 1,000,000 sorts of 10,000 rows.

BTW, judging from the execution times you posted, I'd guess that you
have no index on product, price, orderid. If you want to surprise
yourself (and you are allowed to experiment on the system the table's
residing on), try adding a clustered index (non-unique, if there can
be multiple rows with the same product, price, orderid combination) on
the table and then re-executing your version of the query.

Best, Hugo
--

Anita
3/15/2004 4:57:55 PM
Hugo,

Many thanks for your good explanation.

Anita Hery



*** Sent via Developersdex http://www.developersdex.com ***
AddThis Social Bookmark Button