[quoted text, click to view] "HK" <replywithingroup@notreal.com> wrote in message news:8kGMf.2874$xS5.921@tornado.socal.rr.com... > With SQL 2005 Standard: I have records in a table that show how long it > takes to complete a task... > > ID_KEY DATETIME (nearest minute) DURATION > > When a task is started, a record is added with a "null" duration, and when > the task is finished, the duration is updated to represent the number of > seconds the task took to complete. So, here's sample values: > > 133541 2006-02-27 16:51:00 NULL > 133540 2006-02-27 16:51:00 60 > 133539 2006-02-27 16:50:00 NULL > 133538 2006-02-27 16:50:00 153 > 133537 2006-02-27 16:50:00 30 > > The above data shows that 2 tasks are not complete and 3 tasks finished in > those respective #seconds. >
How about doing a DATEADD on the seconds for completed tasks as a time2 in the query, then select BETWEEN time1 and time2. Rick Sawtell MCT, MCSD, MCDBA
[quoted text, click to view] > When a task is started, a record is added with a "null" duration, and when > the task is finished, the duration is updated to represent the number of > seconds the task took to complete. So, here's sample values: > > 133541 2006-02-27 16:51:00 NULL > 133540 2006-02-27 16:51:00 60 > 133539 2006-02-27 16:50:00 NULL > 133538 2006-02-27 16:50:00 153 > 133537 2006-02-27 16:50:00 30 > > The above data shows that 2 tasks are not complete and 3 tasks finished in > those respective #seconds.
I assume you realize why this task is so difficult. For those who don't, this is what happens when the schema is not properly normalized. In this case, I am assuming that there is a 1-1 relationship between the "start" row of a given task and the "end" row of a given task - both are related by a common value for ID_KEY. Comments below are predicated on these assumptions. [quoted text, click to view] > My question: I would like to know how to determine the number of > simulataneous tasks in process for any given period of time, or by the > hour > of the day. Right this moment, I can determine that by doing SELECT > COUNT(*) FROM TABLE WHERE DURATION IS NULL, but once you get into the > historical perspective (where the nulls are all real duration values), you > have to figure our which duration values overlap with other duration > values. > ID_KEY DATETIME (nearest minute) DURATION
The basic query that provides the data you seek is: select t1.ID_KEY, t1.DATETIME as STARTDT, DATEADD(mi, t2.DURATION, t1.DATETIME) as ENDDT, t2.DURATION from mytable as t1 left outer join mytable as t2 on t1.ID_KEY = t2.ID_KEY and t1.DATETIME is null and t2.DATETIME is not null order by ???? Using the above as a derived table, the logic you need is something along the lines of: declare @target datetime set @target = '20060227 13:45:00.000' select ..... from [this is where the derived table goes, assume an alias of durdata] where STARTDT <= @target and coalesce (ENDDT, @target) >= @target order by ???? This gives you information as of a particular moment in time. You should be able to expand that logic to get information for a particular time period. [quoted text, click to view] > Yeah, I know my full-minute rounding will screw up the analysis, but > coming > close would be nice right now. For example, the 153 second task was > simultaneous with the two tasks that started one minute later. But how > do > you run some kind of query against the table for periods of time where the > durations are all now complete (no nulls for yesterday, for example, > because > those tasks are complete). How do I determine those "max simultaneous > tasks > were in progress" during hour by hour windows of time? Or something like > that? I really need this to determine staffing, because if we had 100 > simulatenous tasks between 8 and 8:10 AM yesterday morning, I know we need > to hire more people.
If you want hour-by-hour information, just create (and populate) a "calendar" table. This can be done in different ways - one way is dynamically with a table-valued function. With that table, you join to the derived table and substitute the variable boundary values with the associated columns from the calendar table.
[quoted text, click to view] > Thanks for the "join to itself" example. I'm not understanding the > calendar table, sorry. Are you able to give a quick example of that? Are > you meaning to create a temp table?
Perhaps. There are many uses for a calendar table (or, more simply, a table of sequential numbers). You can search the newsgroup for reasons and examples of their usage. Whether there is value in making this information permanent is something you must determine based on its useful as well as other factors that may not be apparent to outsiders like us. Most people discover that there are special ways of handling information based on whether a date is a "work day" (as opposed to a holiday, snow day, or other non-working day) Assume a temp table contains this info. At its simplest, it is just a table of dates. Simplying your request some more, lets assume that we want hourly numbers for the current date. Given the previously posted queries, what specifically do we need in order to generate the appropriate information? Answer - we need the boundaries of our timeperiods. Back to the example, we need a table of two columns that define our boundary values for each period of interest. The rows would look like the following: '20060227 00:00:00.000' , '20060227 01:00:00.000' '20060227 01:00:00.000' , '20060227 02:00:00.000' '20060227 02:00:00.000' , '20060227 03:00:00.000' Did you notice the overlap of time from the end of one period to the start of the next. By usage / convention, we'll prevent the overlap from altering our numbers. Why overlap - for clarification / presentation purposes only. To get your data, join this table using the previous derived table example - replace the variable usage with the appropriate columns and add the group by information. Focus particularly on the coalesce usage. Note that a task applies to a period if the startdate of the task precedes the end date of the period (not inclusive) and the end date of the task falls on/after the start date of the period (inclusive - this is the "usage convention" mentioned earlier). Note that a task that has not completed applies to all periods that end after the task start date. There are tricks to generating the timeperiod information. For now, it is probably easiest to focus on the logic and ignore whatever tricks might be used for generation. It probably makes most sense, from a developing perspective, to create a temp table and populate it with insert statements in a script. It's only 24 insert statements and you can copy/paste the information quickly. Play with the concept a bit. I'll work up an example if you still have questions.
With SQL 2005 Standard: I have records in a table that show how long it takes to complete a task... ID_KEY DATETIME (nearest minute) DURATION When a task is started, a record is added with a "null" duration, and when the task is finished, the duration is updated to represent the number of seconds the task took to complete. So, here's sample values: 133541 2006-02-27 16:51:00 NULL 133540 2006-02-27 16:51:00 60 133539 2006-02-27 16:50:00 NULL 133538 2006-02-27 16:50:00 153 133537 2006-02-27 16:50:00 30 The above data shows that 2 tasks are not complete and 3 tasks finished in those respective #seconds. My question: I would like to know how to determine the number of simulataneous tasks in process for any given period of time, or by the hour of the day. Right this moment, I can determine that by doing SELECT COUNT(*) FROM TABLE WHERE DURATION IS NULL, but once you get into the historical perspective (where the nulls are all real duration values), you have to figure our which duration values overlap with other duration values. Yeah, I know my full-minute rounding will screw up the analysis, but coming close would be nice right now. For example, the 153 second task was simultaneous with the two tasks that started one minute later. But how do you run some kind of query against the table for periods of time where the durations are all now complete (no nulls for yesterday, for example, because those tasks are complete). How do I determine those "max simultaneous tasks were in progress" during hour by hour windows of time? Or something like that? I really need this to determine staffing, because if we had 100 simulatenous tasks between 8 and 8:10 AM yesterday morning, I know we need to hire more people.
[quoted text, click to view] "Rick Sawtell" <Quickening@msn.com> wrote in message news:%23BssXF8OGHA.3728@tk2msftngp13.phx.gbl... > > "HK" <replywithingroup@notreal.com> wrote in message > news:8kGMf.2874$xS5.921@tornado.socal.rr.com... > > With SQL 2005 Standard: I have records in a table that show how long it > > takes to complete a task... > > > > ID_KEY DATETIME (nearest minute) DURATION > > > > When a task is started, a record is added with a "null" duration, and when > > the task is finished, the duration is updated to represent the number of > > seconds the task took to complete. So, here's sample values: > > > > 133541 2006-02-27 16:51:00 NULL > > 133540 2006-02-27 16:51:00 60 > > 133539 2006-02-27 16:50:00 NULL > > 133538 2006-02-27 16:50:00 153 > > 133537 2006-02-27 16:50:00 30 > > > > The above data shows that 2 tasks are not complete and 3 tasks finished in > > those respective #seconds. > > > > How about doing a DATEADD on the seconds for completed tasks as a time2 in > the query, then select BETWEEN time1 and time2. > > Rick Sawtell > MCT, MCSD, MCDBA > > > > > >
Although I understand the concept and how to use date functions, I still don't know how to put this into production in a real query (the selecting count between time1 and time2 is still not clear when I try), and also how to group by time intervals. Sample syntax would be much appreciated. Thanks.
[quoted text, click to view] "Scott Morris" <bogus@bogus.com> wrote in message news:%23SUSI38OGHA.2124@TK2MSFTNGP14.phx.gbl... > > When a task is started, a record is added with a "null" duration, and when > > the task is finished, the duration is updated to represent the number of > > seconds the task took to complete. So, here's sample values: > > > > 133541 2006-02-27 16:51:00 NULL > > 133540 2006-02-27 16:51:00 60 > > 133539 2006-02-27 16:50:00 NULL > > 133538 2006-02-27 16:50:00 153 > > 133537 2006-02-27 16:50:00 30 > > > > The above data shows that 2 tasks are not complete and 3 tasks finished in > > those respective #seconds. > > I assume you realize why this task is so difficult. For those who don't, > this is what happens when the schema is not properly normalized. In this > case, I am assuming that there is a 1-1 relationship between the "start" row > of a given task and the "end" row of a given task - both are related by a > common value for ID_KEY. Comments below are predicated on these > assumptions. > > > My question: I would like to know how to determine the number of > > simulataneous tasks in process for any given period of time, or by the > > hour > > of the day. Right this moment, I can determine that by doing SELECT > > COUNT(*) FROM TABLE WHERE DURATION IS NULL, but once you get into the > > historical perspective (where the nulls are all real duration values), you > > have to figure our which duration values overlap with other duration > > values. > > > ID_KEY DATETIME (nearest minute) DURATION > > The basic query that provides the data you seek is: > > select t1.ID_KEY, t1.DATETIME as STARTDT, > DATEADD(mi, t2.DURATION, t1.DATETIME) as ENDDT, > t2.DURATION > from mytable as t1 > left outer join mytable as t2 > on t1.ID_KEY = t2.ID_KEY > and t1.DATETIME is null > and t2.DATETIME is not null > order by ???? > > Using the above as a derived table, the logic you need is something along > the lines of: > > declare @target datetime > > set @target = '20060227 13:45:00.000' > select ..... > from [this is where the derived table goes, assume an alias of durdata] > where STARTDT <= @target > and coalesce (ENDDT, @target) >= @target > order by ???? > > This gives you information as of a particular moment in time. You should be > able to expand that logic to get information for a particular time period. > > > Yeah, I know my full-minute rounding will screw up the analysis, but > > coming > > close would be nice right now. For example, the 153 second task was > > simultaneous with the two tasks that started one minute later. But how > > do > > you run some kind of query against the table for periods of time where the > > durations are all now complete (no nulls for yesterday, for example, > > because > > those tasks are complete). How do I determine those "max simultaneous > > tasks > > were in progress" during hour by hour windows of time? Or something like > > that? I really need this to determine staffing, because if we had 100 > > simulatenous tasks between 8 and 8:10 AM yesterday morning, I know we need > > to hire more people. > > If you want hour-by-hour information, just create (and populate) a > "calendar" table. This can be done in different ways - one way is > dynamically with a table-valued function. With that table, you join to the > derived table and substitute the variable boundary values with the > associated columns from the calendar table. > >
Thanks for the "join to itself" example. I'm not understanding the calendar table, sorry. Are you able to give a quick example of that? Are you meaning to create a temp table?
[quoted text, click to view] "Scott Morris" <bogus@bogus.com> wrote in message news:u1knRF%23OGHA.3728@tk2msftngp13.phx.gbl... > > Thanks for the "join to itself" example. I'm not understanding the > > calendar table, sorry. Are you able to give a quick example of that? Are > > you meaning to create a temp table? > > Perhaps. There are many uses for a calendar table (or, more simply, a table > of sequential numbers). You can search the newsgroup for reasons and > examples of their usage. Whether there is value in making this information > permanent is something you must determine based on its useful as well as > other factors that may not be apparent to outsiders like us. Most people > discover that there are special ways of handling information based on > whether a date is a "work day" (as opposed to a holiday, snow day, or other > non-working day) > > Assume a temp table contains this info. At its simplest, it is just a table > of dates. Simplying your request some more, lets assume that we want hourly > numbers for the current date. Given the previously posted queries, what > specifically do we need in order to generate the appropriate information? > > Answer - we need the boundaries of our timeperiods. Back to the example, we > need a table of two columns that define our boundary values for each period > of interest. The rows would look like the following: > > '20060227 00:00:00.000' , '20060227 01:00:00.000' > '20060227 01:00:00.000' , '20060227 02:00:00.000' > '20060227 02:00:00.000' , '20060227 03:00:00.000' > > Did you notice the overlap of time from the end of one period to the start > of the next. By usage / convention, we'll prevent the overlap from altering > our numbers. Why overlap - for clarification / presentation purposes only. > > To get your data, join this table using the previous derived table example - > replace the variable usage with the appropriate columns and add the group by > information. Focus particularly on the coalesce usage. Note that a task > applies to a period if the startdate of the task precedes the end date of > the period (not inclusive) and the end date of the task falls on/after the > start date of the period (inclusive - this is the "usage convention" > mentioned earlier). Note that a task that has not completed applies to all > periods that end after the task start date. > > There are tricks to generating the timeperiod information. For now, it is > probably easiest to focus on the logic and ignore whatever tricks might be > used for generation. It probably makes most sense, from a developing > perspective, to create a temp table and populate it with insert statements > in a script. It's only 24 insert statements and you can copy/paste the > information quickly. > > Play with the concept a bit. I'll work up an example if you still have > questions. > >
So you were talking about creating a truly separate table, with each row being a period of time, like hourly. I understand now, thanks. But one thing still escapes me. So you have the thousands of raw records (the example records I gave in the beginning of this post) in each day. If I said tell me the busiest time of day yesterday, and how many simultaneous tasks were being executed at that moment, how would you determine the answer? Would you create 24*60 rows in your calendar table, one for each minute? And if so, how would you do this in an automated way, to both create the calendar table rows AND insert the count value into each row? I certainly don't want to hand-type 24*60 entries for each day. I can imagine writing code that looks through a recordset, row by row, with counters as it goes. But I would also think that there should be a way to do this from query analyzer.
[quoted text, click to view] > So you were talking about creating a truly separate table, with each row > being a period of time, like hourly. I understand now, thanks.
yes - based on your examples and questions only. Obviously, the solution depends on what "answer" you seek. [quoted text, click to view] > But one thing still escapes me. So you have the thousands of raw records > (the example records I gave in the beginning of this post) in each day. > If > I said tell me the busiest time of day yesterday, and how many > simultaneous > tasks were being executed at that moment, how would you determine the > answer? Would you create 24*60 rows in your calendar table, one for each > minute?
Good requirements often lead you to a good solution. Define "busiest time of the day". This raises the issue of how accurate an answer you seek (and how accurate your data is). If the data is only accurate to the nearest minute, then the answer is equally limited. Knowing that, we need to calculate the outstanding task counts on a minute basis throughout the period of interest (yesterday) and get the largest count. This involves 1 day x 24 hours x 60 minutes invidividual calculation periods; this then drives the content of the temp table discussed earlier. BTW - a table of 1440 (24 x 60) rows is a very, VERY, small table. [quoted text, click to view] > And if so, how would you do this in an automated way, to both > create the calendar table rows AND insert the count value into each row? > I > certainly don't want to hand-type 24*60 entries for each day.
You said "automated", followed by "hand-type". Obviously, anything that you do on a regular basis can be automated and there is no reason why you would need to "hand-type" anything more than once. Is it possible to write a script to populate a temp table with the hourly intervals for a specified date? Of course. There is a looping construct in tsql. There are also tricks, as mentioned earlier. Steve Kass has posted a UDF that generates a table of sequential numbers in the .programming NG, if you want to see one such technique. [quoted text, click to view] > I can imagine writing code that looks through a recordset, row by row, > with > counters as it goes.
Yes - you do this because you are most familiar with conventional programming. Good relational solutions involve set manipulations. Very rarely is a "row-by-row" solution appropriate; often such a solution is far from "best". Search the newsgroups for the term "cursor" to see the oft repeated discussions about such things. [quoted text, click to view] > But I would also think that there should be a way to do this from query > analyzer.
tsql code is tsql code. How you execute it is your decision. QA is but one tool that can do this.
[quoted text, click to view] > select t1.ID_KEY, t1.DATETIME as STARTDT, > DATEADD(mi, t2.DURATION, t1.DATETIME) as ENDDT, > t2.DURATION > from mytable as t1 > left outer join mytable as t2 > on t1.ID_KEY = t2.ID_KEY > and t1.DATETIME is null > and t2.DATETIME is not null > order by ????
This is wrong. Replace the datetime column in the join conditions with the duration column. I should point out that there is an assumption regarding the datetime column (nice name, btw) - specifically that all rows for a task (identified by ID_KEY) have the same value for the DATETIME column.
A working example set nocount on go if object_id('tempdb..#rawdata') is not null drop table #rawdata go if object_id('tempdb..#period') is not null drop table #period go create table #rawdata ( id_key int not null , tstamp datetime not null , duration int null ) insert #rawdata (id_key, tstamp, duration) select 31, '20060227 16:51:00', NULL union all select 31, '20060227 16:51:00', 181 union all select 28, '20060227 08:03:00', null union all select 28, '20060227 08:03:00', 600 union all select 12, '20060227 00:00:00', null union all select 900, '20060227 00:00:00', null union all select 901, '20060227 12:00:00', null union all select 88, '20060227 23:00:00', null union all select 88, '20060227 23:00:00', 60 select id_key, tstamp, duration from #rawdata order by id_key, tstamp, duration select t1.id_key, t1.tstamp as STARTDT, dateadd (mi, t2.duration, t1.tstamp) as ENDDT, t2.DURATION from #rawdata as t1 left outer join #rawdata as t2 on t1.id_key = t2.id_key and t1.duration is null and t2.duration is not null where t1.duration is null create table #period ( pstart datetime not null , pend datetime not null ) declare @temp int, @start datetime set @start = '20060227' while @start < '20060228' begin insert #period (pstart, pend) values (@start, dateadd (hour, 1, @start)) set @start = dateadd (hour, 1, @start) end select * from #period order by pstart select * from ( select t1.id_key, t1.tstamp as startdt, dateadd (mi, t2.duration, t1.tstamp) as enddt, t2.duration from #rawdata as t1 left outer join #rawdata as t2 on t1.id_key = t2.id_key and t1.duration is null and t2.duration is not null where t1.duration is null ) as normal select #period.*, normal.* from ( select t1.id_key, t1.tstamp as startdt, dateadd (mi, t2.duration, t1.tstamp) as enddt, t2.duration from #rawdata as t1 left outer join #rawdata as t2 on t1.id_key = t2.id_key and t1.duration is null and t2.duration is not null where t1.duration is null ) as normal inner join #period on normal.startdt < #period.pend and coalesce (normal.enddt, #period.pend) > #period.pstart order by normal.id_key, #period.pstart select #period.pstart, #period.pend, count(*) from ( select t1.id_key, t1.tstamp as startdt, dateadd (mi, t2.duration, t1.tstamp) as enddt, t2.duration from #rawdata as t1 left outer join #rawdata as t2 on t1.id_key = t2.id_key and t1.duration is null and t2.duration is not null where t1.duration is null ) as normal inner join #period on normal.startdt < #period.pend and coalesce (normal.enddt, #period.pend) > #period.pstart group by #period.pstart, #period.pend order by #period.pstart
Thank you for spending the time on the examples. [quoted text, click to view] "Scott Morris" <bogus@bogus.com> wrote in message news:OBIYqVHPGHA.140@TK2MSFTNGP12.phx.gbl... > A working example > > set nocount on > go > > if object_id('tempdb..#rawdata') is not null > drop table #rawdata > go > if object_id('tempdb..#period') is not null > drop table #period > go > > > > create table #rawdata ( > id_key int not null > , tstamp datetime not null > , duration int null > ) > > > insert #rawdata (id_key, tstamp, duration) > select 31, '20060227 16:51:00', NULL > union all > select 31, '20060227 16:51:00', 181 > union all > select 28, '20060227 08:03:00', null > union all > select 28, '20060227 08:03:00', 600 > union all > select 12, '20060227 00:00:00', null > union all > select 900, '20060227 00:00:00', null > union all > select 901, '20060227 12:00:00', null > union all > select 88, '20060227 23:00:00', null > union all > select 88, '20060227 23:00:00', 60 > > select id_key, tstamp, duration from #rawdata order by id_key, tstamp, > duration > > select t1.id_key, t1.tstamp as STARTDT, > dateadd (mi, t2.duration, t1.tstamp) as ENDDT, > t2.DURATION > from #rawdata as t1 > left outer join #rawdata as t2 > on t1.id_key = t2.id_key > and t1.duration is null > and t2.duration is not null > where t1.duration is null > > create table #period ( > pstart datetime not null > , pend datetime not null > ) > > declare @temp int, @start datetime > set @start = '20060227' > while @start < '20060228' > begin > insert #period (pstart, pend) values (@start, dateadd (hour, 1, @start)) > set @start = dateadd (hour, 1, @start) > end > > select * from #period order by pstart > > select * > from ( > select t1.id_key, t1.tstamp as startdt, > dateadd (mi, t2.duration, t1.tstamp) as enddt, > t2.duration > from #rawdata as t1 > left outer join #rawdata as t2 > on t1.id_key = t2.id_key > and t1.duration is null > and t2.duration is not null > where t1.duration is null > ) as normal > > select #period.*, normal.* > from ( > select t1.id_key, t1.tstamp as startdt, > dateadd (mi, t2.duration, t1.tstamp) as enddt, > t2.duration > from #rawdata as t1 > left outer join #rawdata as t2 > on t1.id_key = t2.id_key > and t1.duration is null > and t2.duration is not null > where t1.duration is null > ) as normal > inner join #period > on normal.startdt < #period.pend > and coalesce (normal.enddt, #period.pend) > #period.pstart > order by normal.id_key, #period.pstart > > > select #period.pstart, #period.pend, count(*) > from ( > select t1.id_key, t1.tstamp as startdt, > dateadd (mi, t2.duration, t1.tstamp) as enddt, > t2.duration > from #rawdata as t1 > left outer join #rawdata as t2 > on t1.id_key = t2.id_key > and t1.duration is null > and t2.duration is not null > where t1.duration is null > ) as normal > inner join #period > on normal.startdt < #period.pend > and coalesce (normal.enddt, #period.pend) > #period.pstart > group by #period.pstart, #period.pend > order by #period.pstart > >
Don't see what you're looking for? Try a search.
|