Groups | Blog | Home
all groups > sql server (alternate) > march 2004 >

sql server (alternate) : Finding data where another column is maximum


Serge Rielau
3/25/2004 5:01:08 PM
[quoted text, click to view]
But it's going to get slower fast as you add data due to the selfjoin.
To get linear scalability you can concat the time to the temperature
(turn both into a string), then max() on that and sub string it back.

In standard SQL (sorry my T-SQL is spotty at best) that should be
something like:
SELECT INTEGER(SUBSTR(maxtemp, 1, 6)), TIMESTAMP(SUBSTR(maxtemp, 7, 32))
FROM (SELECT MAX(CHAR(temp) || CHAR(time_taken)) AS maxtemp
FROM temps) AS T

Should be trivial to translate to T-SQL

Cheers
Serge

--
Serge Rielau
DB2 SQL Compiler Development
Lucvdv
3/25/2004 7:10:43 PM
This is a problem my collegue asked me for help with, but I don't see
any light either. Neither of us has had a training in SQL, that's
probably why ;)


A simplification of the problem we have: suppose you have a table with a
datetime column and a numeric column (for example the results of
temperature measurements).

There are several records per day, at varying hours.

How would you go about building a query that returns, for example,

- the maximum temperature for each day along with the time it was
measured?

- the last measurement for each day with its corresponding temperature?


Finding the maximum temperature or the last measurement for each day is
no problem, but how do you get the other column(s) from the
corresponding records into the result?
John Gilson
3/25/2004 7:57:03 PM
[quoted text, click to view]

CREATE TABLE Temps
(
temp_taken DATETIME NOT NULL PRIMARY KEY,
temp INT NOT NULL
)

INSERT INTO Temps (temp_taken, temp)
VALUES ('20040325 9:00', 40)
INSERT INTO Temps (temp_taken, temp)
VALUES ('20040325 14:00', 50)
INSERT INTO Temps (temp_taken, temp)
VALUES ('20040325 20:00', 42)

-- Max temp for each day
-- Read: For a given day and temp, there isn't a row with a greater temp
SELECT T1.temp_taken, T1.temp
FROM Temps AS T1
LEFT OUTER JOIN
Temps AS T2
ON DATEDIFF(DAY, T1.temp_taken, T2.temp_taken) = 0 AND
T2.temp > T1.temp
WHERE T2.temp_taken IS NULL

temp_taken temp
2004-03-25 14:00:00.000 50

-- Last measurement for each day
-- Read: For a given day, there isn't a row whose time is later
SELECT T1.temp_taken, T1.temp
FROM Temps AS T1
LEFT OUTER JOIN
Temps AS T2
ON DATEDIFF(DAY, T1.temp_taken, T2.temp_taken) = 0 AND
T2.temp_taken > T1.temp_taken
WHERE T2.temp_taken IS NULL

temp_taken temp
2004-03-25 20:00:00.000 42

--
JAG

Lucvdv
3/25/2004 8:28:15 PM
[quoted text, click to view]

Wow - that's fast :)

Thanks.
Erland Sommarskog
3/25/2004 11:42:28 PM
Serge Rielau (srielau@ca.eye-be-em.com) writes:
[quoted text, click to view]

The way I would write would be:

select t.temp_taken, t.temp
from Temps t
join (select date = convert(char(8), temp_taken, 112),
temp = MAX(temp)
from Temps
group by convert(char(8), temp_taken, 112)) as m
on t.temp = m.temp

Not that this might be faster than your suggestion, but it's kind
of the standard technique for this kind of problem.

--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
Lucvdv
3/26/2004 9:31:59 AM
On Thu, 25 Mar 2004 23:42:28 +0000 (UTC), Erland Sommarskog
[quoted text, click to view]

With "that's fast" I meant the time between question and reply, not
execution time :)

It was slow indeed: trying to sort the results on time made it time
out on a table with 50,000 records (without sorting it succeeded).


The solution I finally came up with is almost the same as yours, and
delivers sorted results from the same table in a fraction of a second:

SELECT TOP 100 PERCENT T1.*
FROM dbo.History T1 INNER JOIN
(SELECT MAX(TLO) AS tlo, IP
FROM dbo.History
GROUP BY (datepart(dy, tlo) + 366 * year(tlo)), IP) T2
ON T1.TLO = T2.tlo AND T1.IP = T2.IP
ORDER BY T1.IP, T1.TLO


Data are collected from 50 points, each several times per day - and
this will be growing to about 4000 points over time.
[TLO] is the date/time, [IP] is the (fixed) IP address of the hardware
that produced the data.
joe.celko NO[at]SPAM northface.edu
3/26/2004 10:08:29 AM
[quoted text, click to view]
day is no problem, but how do you get the other column(s) from the
corresponding records [sic] into the result? <<

CREATE TABLE TempReadings
(sample_time TIMESTAMP NOT NULL PRIMARY KEY,
temp INTEGER NOT NULL,
.. );

Here is the trick:

CREATE TABLE Calendar
(start_time DATETIME NOT NULL PRIMARY KEY,
end_time DATETIME NOT NULL);

First a warm-up query -- sorry about the pun -- to get basic summary
data by the day:

SELECT C1.start_date AS sample_date,
MAX(temp) AS min_temp, MIN(temp) AS max_temp,
AVG(temp) AS avg_temp
FROM TempReadings AS T1,
Calendar AS C1
WHERE T1.sample_time BETWEEN C1.start_time AND C1.end_time
GROUP BY C1.start_time;

Then

SELECT C1.start_date AS sample_date,
T1.*
FROM TempReadings AS T1,
Calendar AS C1
WHERE T1.sample_time BETWEEN C1.start_time AND C1.end_time
AND T1.temp
= (SELECT MAX(temp)
FROM TempReadings AS T2
WHERE T1.sample_time BETWEEN C1.start_time
AND C1.end_time);

The last temperature is easy with the same scheme and you could
combine them together into one query.

SELECT C1.start_date AS sample_date,
T1.*
FROM TempReadings AS T1,
Calendar AS C1
WHERE T1.sample_time
= (SELECT MAX(sample_time)
FROM TempReadings AS T2
WHERE T1.sample_time BETWEEN C1.start_time
John Gilson
3/26/2004 5:18:22 PM
[quoted text, click to view]

Actually Erland, I believe you meant as your join condition

ON t.temp = m.temp AND
CONVERT(CHAR(8), t.temp_taken, 112) = m.date

to get the max temp and its corresponding time(s) each day.

Your approach will probably be much faster than mine, i.e.,
the OUTER JOIN approach. Also, the way I specified the same date,
DATEDIFF(DAY, T1.temp_taken, T2.temp_taken) = 0, will be
a killer. The following should be a lot better, but speedwise, I'd
put my money on yours. Empirical tests on a roughly 50,000 row
table seems to bear that out.

SELECT T1.temp_taken, T1.temp
FROM Temps AS T1
LEFT OUTER JOIN
Temps AS T2
on DATEDIFF(DAY, '17530101', T1.temp_taken) =
DATEDIFF(DAY, '17530101', T2.temp_taken) AND
T2.temp > T1.temp
WHERE T2.temp_taken IS NULL

Regards,
John

--
JAG

[quoted text, click to view]

Erland Sommarskog
3/26/2004 11:35:58 PM
John Gilson (jag@acm.org) writes:
[quoted text, click to view]


Thanks for the correction, John!

I did try my query on the script you provided. However, there was only
data for one day, so I got a correct result. :-)



--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
AddThis Social Bookmark Button