[quoted text, click to view] Lucvdv wrote: > John Gilson wrote: > > >>CREATE TABLE Temps > > > Wow - that's fast :) > > Thanks. >
But it's going to get slower fast as you add data due to the selfjoin. To get linear scalability you can concat the time to the temperature (turn both into a string), then max() on that and sub string it back. In standard SQL (sorry my T-SQL is spotty at best) that should be something like: SELECT INTEGER(SUBSTR(maxtemp, 1, 6)), TIMESTAMP(SUBSTR(maxtemp, 7, 32)) FROM (SELECT MAX(CHAR(temp) || CHAR(time_taken)) AS maxtemp FROM temps) AS T Should be trivial to translate to T-SQL Cheers Serge -- Serge Rielau DB2 SQL Compiler Development
This is a problem my collegue asked me for help with, but I don't see any light either. Neither of us has had a training in SQL, that's probably why ;) A simplification of the problem we have: suppose you have a table with a datetime column and a numeric column (for example the results of temperature measurements). There are several records per day, at varying hours. How would you go about building a query that returns, for example, - the maximum temperature for each day along with the time it was measured? - the last measurement for each day with its corresponding temperature? Finding the maximum temperature or the last measurement for each day is no problem, but how do you get the other column(s) from the corresponding records into the result?
[quoted text, click to view] "Lucvdv" <name@null.net> wrote in message news:9ra66019q5rr9lajdmmdk83ljn6een70ha@4ax.com... > This is a problem my collegue asked me for help with, but I don't see > any light either. Neither of us has had a training in SQL, that's > probably why ;) > > > A simplification of the problem we have: suppose you have a table with a > datetime column and a numeric column (for example the results of > temperature measurements). > > There are several records per day, at varying hours. > > How would you go about building a query that returns, for example, > > - the maximum temperature for each day along with the time it was > measured? > > - the last measurement for each day with its corresponding temperature? > > > Finding the maximum temperature or the last measurement for each day is > no problem, but how do you get the other column(s) from the > corresponding records into the result?
CREATE TABLE Temps ( temp_taken DATETIME NOT NULL PRIMARY KEY, temp INT NOT NULL ) INSERT INTO Temps (temp_taken, temp) VALUES ('20040325 9:00', 40) INSERT INTO Temps (temp_taken, temp) VALUES ('20040325 14:00', 50) INSERT INTO Temps (temp_taken, temp) VALUES ('20040325 20:00', 42) -- Max temp for each day -- Read: For a given day and temp, there isn't a row with a greater temp SELECT T1.temp_taken, T1.temp FROM Temps AS T1 LEFT OUTER JOIN Temps AS T2 ON DATEDIFF(DAY, T1.temp_taken, T2.temp_taken) = 0 AND T2.temp > T1.temp WHERE T2.temp_taken IS NULL temp_taken temp 2004-03-25 14:00:00.000 50 -- Last measurement for each day -- Read: For a given day, there isn't a row whose time is later SELECT T1.temp_taken, T1.temp FROM Temps AS T1 LEFT OUTER JOIN Temps AS T2 ON DATEDIFF(DAY, T1.temp_taken, T2.temp_taken) = 0 AND T2.temp_taken > T1.temp_taken WHERE T2.temp_taken IS NULL temp_taken temp 2004-03-25 20:00:00.000 42 -- JAG
[quoted text, click to view] John Gilson wrote: > > CREATE TABLE Temps
Wow - that's fast :) Thanks.
Serge Rielau (srielau@ca.eye-be-em.com) writes: [quoted text, click to view] > But it's going to get slower fast as you add data due to the selfjoin. > To get linear scalability you can concat the time to the temperature > (turn both into a string), then max() on that and sub string it back. > > In standard SQL (sorry my T-SQL is spotty at best) that should be > something like: > SELECT INTEGER(SUBSTR(maxtemp, 1, 6)), TIMESTAMP(SUBSTR(maxtemp, 7, 32)) > FROM (SELECT MAX(CHAR(temp) || CHAR(time_taken)) AS maxtemp > FROM temps) AS T > > Should be trivial to translate to T-SQL
The way I would write would be: select t.temp_taken, t.temp from Temps t join (select date = convert(char(8), temp_taken, 112), temp = MAX(temp) from Temps group by convert(char(8), temp_taken, 112)) as m on t.temp = m.temp Not that this might be faster than your suggestion, but it's kind of the standard technique for this kind of problem. -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at
On Thu, 25 Mar 2004 23:42:28 +0000 (UTC), Erland Sommarskog [quoted text, click to view] <sommar@algonet.se> wrote: > Serge Rielau (srielau@ca.eye-be-em.com) writes: > > But it's going to get slower fast as you add data due to the selfjoin. > > To get linear scalability you can concat the time to the temperature > > (turn both into a string), then max() on that and sub string it back. > > > > In standard SQL (sorry my T-SQL is spotty at best) that should be > > something like: > > SELECT INTEGER(SUBSTR(maxtemp, 1, 6)), TIMESTAMP(SUBSTR(maxtemp, 7, 32)) > > FROM (SELECT MAX(CHAR(temp) || CHAR(time_taken)) AS maxtemp > > FROM temps) AS T > > > > Should be trivial to translate to T-SQL > > The way I would write would be: > > select t.temp_taken, t.temp > from Temps t > join (select date = convert(char(8), temp_taken, 112), > temp = MAX(temp) > from Temps > group by convert(char(8), temp_taken, 112)) as m > on t.temp = m.temp > > Not that this might be faster than your suggestion, but it's kind > of the standard technique for this kind of problem.
With "that's fast" I meant the time between question and reply, not execution time :) It was slow indeed: trying to sort the results on time made it time out on a table with 50,000 records (without sorting it succeeded). The solution I finally came up with is almost the same as yours, and delivers sorted results from the same table in a fraction of a second: SELECT TOP 100 PERCENT T1.* FROM dbo.History T1 INNER JOIN (SELECT MAX(TLO) AS tlo, IP FROM dbo.History GROUP BY (datepart(dy, tlo) + 366 * year(tlo)), IP) T2 ON T1.TLO = T2.tlo AND T1.IP = T2.IP ORDER BY T1.IP, T1.TLO Data are collected from 50 points, each several times per day - and this will be growing to about 4000 points over time. [TLO] is the date/time, [IP] is the (fixed) IP address of the hardware that produced the data.
[quoted text, click to view] >> Finding the maximum temperature or the last measurement for each
day is no problem, but how do you get the other column(s) from the corresponding records [sic] into the result? << CREATE TABLE TempReadings (sample_time TIMESTAMP NOT NULL PRIMARY KEY, temp INTEGER NOT NULL, .. ); Here is the trick: CREATE TABLE Calendar (start_time DATETIME NOT NULL PRIMARY KEY, end_time DATETIME NOT NULL); First a warm-up query -- sorry about the pun -- to get basic summary data by the day: SELECT C1.start_date AS sample_date, MAX(temp) AS min_temp, MIN(temp) AS max_temp, AVG(temp) AS avg_temp FROM TempReadings AS T1, Calendar AS C1 WHERE T1.sample_time BETWEEN C1.start_time AND C1.end_time GROUP BY C1.start_time; Then SELECT C1.start_date AS sample_date, T1.* FROM TempReadings AS T1, Calendar AS C1 WHERE T1.sample_time BETWEEN C1.start_time AND C1.end_time AND T1.temp = (SELECT MAX(temp) FROM TempReadings AS T2 WHERE T1.sample_time BETWEEN C1.start_time AND C1.end_time); The last temperature is easy with the same scheme and you could combine them together into one query. SELECT C1.start_date AS sample_date, T1.* FROM TempReadings AS T1, Calendar AS C1 WHERE T1.sample_time = (SELECT MAX(sample_time) FROM TempReadings AS T2 WHERE T1.sample_time BETWEEN C1.start_time
[quoted text, click to view] "Erland Sommarskog" <sommar@algonet.se> wrote in message news:Xns94B86EE64E2AYazorman@127.0.0.1... > Serge Rielau (srielau@ca.eye-be-em.com) writes: > > But it's going to get slower fast as you add data due to the selfjoin. > > To get linear scalability you can concat the time to the temperature > > (turn both into a string), then max() on that and sub string it back. > > > > In standard SQL (sorry my T-SQL is spotty at best) that should be > > something like: > > SELECT INTEGER(SUBSTR(maxtemp, 1, 6)), TIMESTAMP(SUBSTR(maxtemp, 7, 32)) > > FROM (SELECT MAX(CHAR(temp) || CHAR(time_taken)) AS maxtemp > > FROM temps) AS T > > > > Should be trivial to translate to T-SQL > > The way I would write would be: > > select t.temp_taken, t.temp > from Temps t > join (select date = convert(char(8), temp_taken, 112), > temp = MAX(temp) > from Temps > group by convert(char(8), temp_taken, 112)) as m > on t.temp = m.temp
Actually Erland, I believe you meant as your join condition ON t.temp = m.temp AND CONVERT(CHAR(8), t.temp_taken, 112) = m.date to get the max temp and its corresponding time(s) each day. Your approach will probably be much faster than mine, i.e., the OUTER JOIN approach. Also, the way I specified the same date, DATEDIFF(DAY, T1.temp_taken, T2.temp_taken) = 0, will be a killer. The following should be a lot better, but speedwise, I'd put my money on yours. Empirical tests on a roughly 50,000 row table seems to bear that out. SELECT T1.temp_taken, T1.temp FROM Temps AS T1 LEFT OUTER JOIN Temps AS T2 on DATEDIFF(DAY, '17530101', T1.temp_taken) = DATEDIFF(DAY, '17530101', T2.temp_taken) AND T2.temp > T1.temp WHERE T2.temp_taken IS NULL Regards, John -- JAG [quoted text, click to view] > Not that this might be faster than your suggestion, but it's kind > of the standard technique for this kind of problem. > > -- > Erland Sommarskog, SQL Server MVP, sommar@algonet.se > > Books Online for SQL Server SP3 at > http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
John Gilson (jag@acm.org) writes: [quoted text, click to view] > Actually Erland, I believe you meant as your join condition > > ON t.temp = m.temp AND > CONVERT(CHAR(8), t.temp_taken, 112) = m.date > > to get the max temp and its corresponding time(s) each day.
Thanks for the correction, John! I did try my query on the script you provided. However, there was only data for one day, so I got a correct result. :-) -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at
Don't see what you're looking for? Try a search.
|