all groups > sql server new users > november 2005 >
I'm trying to devise syntax to assign a value to a variable from a field in the table, where the variable is a different value each time through the loop. Obviously this would be an array solution in a normal coding environment. Is there anything I can do in T-Sql to emulate this? This process has a cursor wrapped around it that scans through another table. This table is a lookup table from the other table. --DDL-- CREATE TABLE strings ( recid int, fld1 varchar(50), fld2 varchar(50), fld3 varchar(50), fld4 varchar(50) ) INSERT INTO (recid, fld1, fld2, fld3, fld4) SELECT 1, 'String1a', 'String1b', 'String1c', 'String1d' UNION ALL SELECT 2, 'String2a', 'String2b', 'String2c', 'String2d' UNION ALL SELECT 3, 'String3a', 'String3b', 'String3c', 'String3d' UNION ALL SELECT 4, 'String4a', 'String4b', 'String4c', 'String4d' --DML-- DECLARE @i int DECLARE @idval int DECLARE @invar varchar(50) DECLARE @A1 varchar(50) DECLARE @A2 varchar(50) DECLARE @A3 varchar(50) DECLARE @A4 varchar(50) SET @i = 1 SET @idval = (SELECT foreignkey FROM another_table WHERE criteria = what_is_wanted) WHILE @i < 5 BEGIN SET @invar = ( CASE @i WHEN 1 (SELECT fld1 FROM strings WHERE recid = @idval WHEN 2 (SELECT fld2 FROM strings WHERE recid = @idval WHEN 3 (SELECT fld3 FROM strings WHERE recid = @idval WHEN 4 (SELECT fld4 FROM strings WHERE recid = @idval END ) -- massage and manipulate the value loaded into @invar to prepare it to be written to @A1, @A2, @A3, or @A4 -- the intent is that @A1 = fld1; @A2 = fld2; @A3 = fld3; @A4 = fld4 after the data manipulation is performed. -- how can I choose the correct variable to store @invar based on the iteration of @i ???? SET @i = @i + 1 END
[quoted text, click to view] On Mon, 28 Nov 2005 11:59:05 -0600, Lawrence Garvin wrote: >I'm trying to devise syntax to assign a value to a variable from a field in >the table, where the variable is a different value each time through the >loop. Obviously this would be an array solution in a normal coding >environment. Is there anything I can do in T-Sql to emulate this? > >This process has a cursor wrapped around it that scans through another >table. This table is a lookup table from the other table.
(snip) Hi Lawrence, Why not get rid of the loop and instead use something like this: UPDATE strings SET fld1 = some_expression(field_1), fld2 = some_expression(field_2), fld3 = some_expression(field_3), fld4 = some_expression(field_4), fld5 = some_expression(field_5) WHERE recid = @idval ?? Or, even better, why not normalise your design? If these suggestions don't work, then you'll have to provide more information: sample data (as INSERT statements) and expected results would be great. The part of the code you replaced with a comment would help as well. Read more about providing accurate specs on www.aspfaq.com/5006. Best, Hugo --
[quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:okdso1dcafgfcopf7pkm2nchoktfk3dg83@4ax.com... > On Mon, 28 Nov 2005 11:59:05 -0600, Lawrence Garvin wrote: > >>I'm trying to devise syntax to assign a value to a variable from a field >>in >>the table, where the variable is a different value each time through the >>loop. Obviously this would be an array solution in a normal coding >>environment. Is there anything I can do in T-Sql to emulate this? >> >>This process has a cursor wrapped around it that scans through another >>table. This table is a lookup table from the other table. > (snip) > > Hi Lawrence,
Hugo, thank you for your responses.... [quoted text, click to view] > Why not get rid of the loop and instead use something like this:
The 'loop' was an attempt at consolidating what is currently a very linear structure. I'm certainly open to options. [quoted text, click to view] > UPDATE strings > SET fld1 = some_expression(field_1), > fld2 = some_expression(field_2), > fld3 = some_expression(field_3), > fld4 = some_expression(field_4), > fld5 = some_expression(field_5) > WHERE recid = @idval
But, unfortunately, the purpose of the logic is not to update the database, but rather to extract the field from the database, manipulate it, extract its corresponding value from a values table, and then use the value in a computation. [quoted text, click to view] > Or, even better, why not normalise your design?
Actually, it is normalized, but my simplistic example likely suggests it is not. The actual table is a formula table, where each column contains a term of a formula with a variable number of terms -- up to 8 numerator terms, up to 8 denominator terms, and the operator between the terms can either be addition or subtraction, all of which is stored in the table, one row per formula. My example was simplistic because I really wanted to focus on processing through the up-to-sixteen columns in each row and getting them assigned to the local vars. [quoted text, click to view] > If these suggestions don't work, then you'll have to provide more > information: sample data (as INSERT statements) and expected results > would be great. The part of the code you replaced with a comment would > help as well.
Here's the detailed view of what I'm working with. I apologize that you feel my specs were not accurate enough. In fact, after having read over a month's worth of messages in this newsgroup, I spent a considerable amount of time formulating the specs in my question so that I could focus specifically on the question at hand -- which is how to populate a list of local vars with values from a selection of columns in a single row -- without cluttering the post, or the group, with lots of "useless" details. I've still compressed the schema presented here to only four terms total, but we allow for eight numerator and eight denominator in the table structure. Four should be sufficient to establish the question, don't you think? The problem is that it's not practical to post the several thousand lines of setup and processing code that wrap around this basic question, so I'm doing my best to include what makes the question functional in the context in which it is being used. I have /inherited/ this database structure and it's a production database, so I'm quite limited in schema modifications, but if you have a suggestion for a schema mod that would make a /significant/ difference, I'm certainly listening. Here we go: CREATE TABLE formulas ( surveyid as int, --contains unique identifier for survey ncount as int, --contains count of actual number of numerator terms in formula dcount as int, --contains count of actual number of denominator terms in formula n1 as varchar(50), n2 as varchar(50), d1 as varchar(50), d2 as varchar(50), ) -- the data contained in n1, n2, d1, d2, is a foreign key into another table that contains the actual /values/ used in the formula. -- the value is the actual name of the specific question that provides the data for the formula -- for purposes of this example, we're assuming the mathematical operator is always addition -- note the foreign key relationship is implicit -- it is not declared in the table structure -- and no referential integrity is imposed upon the table relationships -- (as noted, I /inherited/ this schema -- and my role is to maintain and enhance the production environment) CREATE TABLE answers ( surveyid as int, qname as varchar(50), answer as int ) INSERT INTO formulas SELECT 2,1,'Q1_A', 'Q1_B','Q1_TOTAL',NULL UNION ALL SELECT 2,2,'Q2_C','Q2_D','Q2_E',Q2_TOTAL' INSERT INTO answers SELECT 'Q1_A',50 UNION ALL SELECT 'Q1_B',100 UNION ALL SELECT 'Q1_TOTAL',150 UNION ALL SELECT 'Q2_C',25 UNION ALL SELECT 'Q2_D',25 UNION ALL SELECT 'Q2_E',75 UNION ALL SELECT 'Q2_TOTAL',125 -- The purpose of the procedure is to extract the question name, determine if it is NULL, a _TOTAL question -- (which needs to invoke a subcomputation to validate the value stored), or a raw data value. Then, the /value/ -- is retrieved from the data values table, and assigned to a local variable based on the term position. -- So the -end result- that is desired is: DECLARE @workvar varchar(50) DECLARE @retval int DECLARE @n1 varchar(50) DECLARE @n2 varchar(50) DECLARE @d1 varchar(50) DECLARE @d2 varchar(50) DECLARE @surveyid int -- this value is actually set by a cursor that scans through a temp table containing the list of surveys to be processed -- I see no point in replicating the cursor setup and processing loop to establish my question -- so let's assume, for purposes of this question, that we're only processing one survey and it's Survey #1 SET @surveyid = 1 SET @workvar = (SELECT n1 FROM formulas f WHERE f.surveyid = @surveyid) EXEC usp_storedproc @workvar @retval SET @n1 = @retval SET @workvar = (SELECT n2 FROM formulas f WHERE f.surveyid = @surveyid) EXEC usp_storedproc @workvar @retval SET @n2 = @retval SET @workvar = (SELECT d1 FROM formulas f WHERE f.surveyid = @surveyid) EXEC usp_storedproc @workvar @retval SET @d1 = @retval SET @workvar = (SELECT d2 FROM formulas f WHERE f.surveyid = @surveyid) EXEC usp_storedproc @workvar @retval SET @d2 = @retval -- and then the formula is mathematically computed DECLARE @answer int SET @answer = (@n1 + @n2)/(@d1 + @d2) -- The objective is to find a way in which this process does not need to be done linearly. -- It doesn't seem too bad with only four terms, but with sixteen terms, -- it gets excessively long when processed as a linear set of code. -- My objective is to compress that into a LOOP for each of the sixteen terms, since the processing for -- each term is identical. -- The /problem/ is dealing with assigning each of the sixteen terms to a
Thank you Hugo. I understand about priorities, and any help you can offer when you can get to it will be appreciated. In the meantime, that project has a few thousand lines of code in other sections, so I'm not in an immediate hurry. I'll probably be "testing" my "enhancements" through the end of next week, at a minimum. [quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:d6m9p1pk93itcklel8tba6r2n8lj3f4dr2@4ax.com... > On Thu, 1 Dec 2005 09:59:29 -0600, Lawrence Garvin wrote: > > (snip) > > Hi Lawrence, > > I have to apologize for not replying sooner. And I'll also have to task > your patience even more. Real life is terribly busy at the moment. I > still have your reply marked as "unread"; I'll study it in more detail > and reply when I have more time (hopefully in a few days). > > Best, Hugo > -- > > (Remove _NO_ and _SPAM_ to get my e-mail address)
[quoted text, click to view] On Thu, 1 Dec 2005 09:59:29 -0600, Lawrence Garvin wrote:
(snip) Hi Lawrence, I have to apologize for not replying sooner. And I'll also have to task your patience even more. Real life is terribly busy at the moment. I still have your reply marked as "unread"; I'll study it in more detail and reply when I have more time (hopefully in a few days). Best, Hugo --
[quoted text, click to view] On Thu, 1 Dec 2005 09:59:29 -0600, Lawrence Garvin wrote: >"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message >news:okdso1dcafgfcopf7pkm2nchoktfk3dg83@4ax.com... > >> On Mon, 28 Nov 2005 11:59:05 -0600, Lawrence Garvin wrote: >> >>>I'm trying to devise syntax to assign a value to a variable from a field >>>in >>>the table, where the variable is a different value each time through the >>>loop. Obviously this would be an array solution in a normal coding >>>environment. Is there anything I can do in T-Sql to emulate this? >>> >>>This process has a cursor wrapped around it that scans through another >>>table. This table is a lookup table from the other table. >> (snip) >> >> Hi Lawrence, > > >Hugo, thank you for your responses.... > >> Why not get rid of the loop and instead use something like this: > >The 'loop' was an attempt at consolidating what is currently a very linear >structure. I'm certainly open to options.
Hi Lawrence, After reading your message, I think you're barking up the wrong tree (if that's the correct proverb in this context). You're trying to change code that is linear but easy to read and follow with code that's harder to understand, but still performs linear. The code might look prettier to you (that's a matter of taste, after all), but you've done nothing to minimize DB access. That should be your prime target. I'll give you a more efficient replacement code for the code you posted, and some comments after that. But first a small correction: [quoted text, click to view] >SET @workvar = (SELECT n2 FROM formulas f WHERE f.surveyid = @surveyid) >EXEC usp_storedproc @workvar @retval >SET @n2 = @retval
If you expect usp_storedproc to change the value of @retval, than you'll have to include the OUTPUT option in both the procedure's definition and in the call: [quoted text, click to view] >EXEC usp_storedproc @workvar @retval OUTPUT
Anyway - here's how I would rewrite the example you posted: DECLARE @n1W varchar(50) DECLARE @n2W varchar(50) DECLARE @d1W varchar(50) DECLARE @d2W varchar(50) DECLARE @n1 varchar(50) DECLARE @n2 varchar(50) DECLARE @d1 varchar(50) DECLARE @d2 varchar(50) DECLARE @surveyid int SET @surveyid = 1 SELECT @n1W = n1, @n2W = n2, @d1W = d1, @d2W = d2 FROM formulas WHERE surveyid = @surveyid EXEC usp_storedproc @n1W @n1 OUTPUT EXEC usp_storedproc @n2W @n2 OUTPUT EXEC usp_storedproc @d1W @d1 OUTPUT EXEC usp_storedproc @d2W @d2 OUTPUT -- and then the formula is mathematically computed DECLARE @answer int SET @answer = (@n1 + @n2)/(@d1 + @d2) -- ****** Are you aware that this concatenates the n1 and n2 strings, -- ****** concatenates the d1 and d2 strings, then attempts to convert -- ****** both to numeric before dividing?? -- ****** Are you SURE that's what you want? Are the datatype correct??? Some remaining comments: 1. Still long for 16 variables - but not as long as either of your versions. And it'll perform (slightly) better. 2. Still 4 (in the full version 16) calls to usp_storedproc. Since you didn't post the code of this proc, I can't assess if it's possible to use a completely set-based solution. Depending on what exactly this proc does, this will likely become a bottle-neck for your performance. [quoted text, click to view] > -- this value is actually set by a cursor that scans through a temp table >containing the list of surveys to be processed > -- I see no point in replicating the cursor setup and processing loop to >establish my question > -- so let's assume, for purposes of this question, that we're only >processing one survey and it's Survey #1
3. You're partly right. For the problem itself, the exact cursor code is not relevant. But if you really want to improve your code, you should try to get rid of the cursor and rewrite your logic to process all rows in the temp table at once. Or even take it one step further: get rid of the temp table as well; replace the temp table with the logic you now use to populate the temp table. Replacing the cursor with a set-based operation requires moving the stored procedure's logic into the query. Since I don't know the stored procedure, I can't help here. [quoted text, click to view] >Here's the detailed view of what I'm working with. I apologize that you feel >my specs were not accurate enough. In fact, after having read over a month's >worth of messages in this newsgroup, I spent a considerable amount of time >formulating the specs in my question so that I could focus specifically on >the question at hand -- which is how to populate a list of local vars with >values from a selection of columns in a single row -- without cluttering the >post, or the group, with lots of "useless" details. I've still compressed >the schema presented here to only four terms total, but we allow for eight >numerator and eight denominator in the table structure. Four should be >sufficient to establish the question, don't you think? > >The problem is that it's not practical to post the several thousand lines of >setup and processing code that wrap around this basic question, so I'm doing >my best to include what makes the question functional in the context in >which it is being used.
4. I appreciate your efforts to post your question as short and to-the-point as possible. This can indeed be a judgement call. If you have a question that involves three columns and you post a CREATE TABLE statement with 140 columns and all 80 CHECK constraints and 12 CREATE INDEX statements, you're clearly posting too much :-) But as a rule of thumb, I'd say: when in doubt, include more details. This holds especially true for code. If I see 400 lines of code in a post, I'll just copy and paste it into Query Analyzer, then focus on the part of the data that (I think) is interesting. That doesn't cost much more time than doing the same with 40 lines of code. In this case, you had already decided on a solution ("use cursor to iterate over the temp table, process formulas one by one, then calculate results") and asked help for a specific detail of your solution. As you have seen, posting more information allowed me to post a suggestion that is not a straight answer to your original question, but that I hope helps you better. I've also hinted at what might be even better solutions, but I can;t help with those, for lack of even more details. I'm not holding this against you. You couldn't have known (if you had known, you wouldn't have asked for help). I'm just trying to explain why I was asking for more details - becuase I wanted to answer the question behind the question, instead of just answering the question you asked. Best, Hugo --
[quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message = news:0i2cp1tf8q8e0oucih0d44iablev7b6phi@4ax.com... > You're trying to change > code that is linear but easy to read and follow with code that's = harder > to understand, but still performs linear. The code might look prettier > to you (that's a matter of taste, after all), but you've done nothing = to > minimize DB access. That should be your prime target.
Well, actually, the code was not easy to read. Furthermore, by building = the loop, I was able to reduce several thousand lines of T-SQL code = (which still needed to grow, as the actual code only allowed for a small = percentage of the permutations of eight numerators and eight = denominators), to 200 lines which handles all permutations of the 16 = possible terms of the equation. [quoted text, click to view] > I'll give you a more efficient replacement code for the code you = posted, > and some comments after that. But first a small correction: >=20 >>SET @workvar =3D (SELECT n2 FROM formulas f WHERE f.surveyid =3D = @surveyid) >>EXEC usp_storedproc @workvar @retval >>SET @n2 =3D @retval >=20 > If you expect usp_storedproc to change the value of @retval, than = you'll > have to include the OUTPUT option in both the procedure's definition = and > in the call: >=20 >>EXEC usp_storedproc @workvar @retval OUTPUT
Yes, this was a typo. The actual code syntax is correct. I overlooked = the addition of the OUTPUT keyword in my example. [quoted text, click to view] > SELECT @n1W =3D n1, @n2W =3D n2, @d1W =3D d1, @d2W =3D d2 > FROM formulas > WHERE surveyid =3D @surveyid
If only it were this simple... but this does give me an additional thing = to look at. The thing is.... n1, n2, d1, and d2 are /names/ of 'questions' that = contain answers. So, after obtaining the values of n1, n2, d1, and d2, = those values then need to be used in another table lookup to get the = /answers/ to be used in the formula. So what actually happens is SELECT @answer-n1 =3D ( SELECT an1 FROM answers a WHERE a.question =3D ( SELECT n1 FROM questions ) ) But under certain circumstances -- when a.question represents a computed = total of individual questions, it must also be validated against the = stored procedure. So the above code is only executed /if/ the question = is not a totals question, otherwise we call the stored procedure to = retrieve the value after validating the associated data that makes up = the totals value. All of which is pretty irrelevant to the question. [quoted text, click to view] > EXEC usp_storedproc @n1W @n1 OUTPUT > EXEC usp_storedproc @n2W @n2 OUTPUT > EXEC usp_storedproc @d1W @d1 OUTPUT > EXEC usp_storedproc @d2W @d2 OUTPUT >=20 > -- and then the formula is mathematically computed > DECLARE @answer int > SET @answer =3D (@n1 + @n2)/(@d1 + @d2)
[quoted text, click to view] > -- ****** Are you aware that this concatenates the n1 and n2 strings, > -- ****** concatenates the d1 and d2 strings, then attempts to convert > -- ****** both to numeric before dividing?? > -- ****** Are you SURE that's what you want? Are the datatype =
correct??? Actually, in the real formula the destination local var is declared as = decimal, so implicit conversion kicks in. [quoted text, click to view] > Some remaining comments: >=20 > 1. Still long for 16 variables - but not as long as either of your > versions. And it'll perform (slightly) better. >=20 > 2. Still 4 (in the full version 16) calls to usp_storedproc. Since you > didn't post the code of this proc, I can't assess if it's possible to > use a completely set-based solution. Depending on what exactly this = proc > does, this will likely become a bottle-neck for your performance.
I completely rewrote the stored proc, but the stored proc is not really = relevant to my question, as the question is how to assign the returned = value to a local variable. [quoted text, click to view] >> -- this value is actually set by a cursor that scans through a temp = table=20 >>containing the list of surveys to be processed >> -- I see no point in replicating the cursor setup and processing loop = to=20 >>establish my question >> -- so let's assume, for purposes of this question, that we're only=20 >>processing one survey and it's Survey #1 >=20 > 3. You're partly right. For the problem itself, the exact cursor code = is > not relevant. But if you really want to improve your code, you should > try to get rid of the cursor and rewrite your logic to process all = rows > in the temp table at once.=20
Unfortunately not possible that I can see.. but yes, I'd definitely like = to get rid of the cursors. The problem is that we have a matrix of = respondents, surveys, and questions that needs to be dynamically built = and processed. If you know of a way to build T-SQL logic that says: FOR EACH respondent-survey row in table u AND EACH question in table q WHERE u.survey =3D q.survey =20 DO this process Then I'm all ears.=20 [quoted text, click to view] > Or even take it one step further: get rid of > the temp table as well; replace the temp table with the logic you now > use to populate the temp table. > Replacing the cursor with a set-based operation requires moving the > stored procedure's logic into the query. Since I don't know the stored > procedure, I can't help here.
So.... since the last post, I've actually completed refining all of the = code, and significantly simplifying it, and of course, this last 'quirk' = is the only thing I can't work out, but it's much less intrusive now = then it was in the original code. And now, because it's significantly = shorter, and simpler, I can post a blinded source for the whole process = here, which might help even more. Frankly, now, the issue seems trivial in comparision of what used to be. = :-) DECLARE @LastUpdate datetime SET @LastUpdate =3D (SELECT UpdateTime FROM LASTTIMERUN) CREATE TABLE #temp_users( userid nvarchar(100), surveyid int ) INSERT INTO #temp_users SELECT DISTINCT u.userid, s.surveyid FROM users u WHERE u.entrydate > @LastUpdate -- we build temp-answers because the answers are stored in raw, = collected, text form, -- and must be validated and scrubbed to numeric form for use in the = computations -- invalid data sets result in aborting the computation for that dataset CREATE TABLE temp_answers ( surveyid int, userid varchar(40), questionnumber varchar(50), answer money, ) INSERT INTO temp_answers SELECT a.surveyid, a.Userid, a.questionnumber, usp_scrubdata(a.answer) FROM answers a INNER JOIN #temp_users t ON a.surveyid =3D t.surveyid AND a.userid =3D t.userid DECLARE @surveyid int DECLARE @userid varchar(50) DECLARE @formulaid int DECLARE @val money DECLARE @ndterm varchar(50) DECLARE @AN1 money DECLARE @AN2 money DECLARE @AD1 money DECLARE @AD2 money DECLARE @tval money DECLARE @i int DECLARE @nunits int DECLARE @dunits int DECLARE user_cursor CURSOR READ_ONLY
[quoted text, click to view] On Mon, 12 Dec 2005 10:29:08 -0600, Lawrence Garvin wrote: > >"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:0i2cp1tf8q8e0oucih0d44iablev7b6phi@4ax.com... > >> You're trying to change >> code that is linear but easy to read and follow with code that's harder >> to understand, but still performs linear. The code might look prettier >> to you (that's a matter of taste, after all), but you've done nothing to >> minimize DB access. That should be your prime target. > >Well, actually, the code was not easy to read. Furthermore, by building the loop, I was able to reduce several thousand lines of T-SQL code (which still needed to grow, as the actual code only allowed for a small percentage of the permutations of eight numerators and eight denominators), to 200 lines which handles all permutations of the 16 possible terms of the equation.
Hi Lawrence, After seeing the partial code you posted, I agree that in this case, it makes sense to choose for a WHILE that iterates the same code 16 times. (snip) [quoted text, click to view] >>> -- this value is actually set by a cursor that scans through a temp table >>>containing the list of surveys to be processed >>> -- I see no point in replicating the cursor setup and processing loop to >>>establish my question >>> -- so let's assume, for purposes of this question, that we're only >>>processing one survey and it's Survey #1 >> >> 3. You're partly right. For the problem itself, the exact cursor code is >> not relevant. But if you really want to improve your code, you should >> try to get rid of the cursor and rewrite your logic to process all rows >> in the temp table at once. > >Unfortunately not possible that I can see.. but yes, I'd definitely like to get rid of the cursors. The problem is that we have a matrix of respondents, surveys, and questions that needs to be dynamically built and processed. If you know of a way to build T-SQL logic that says: > > FOR EACH respondent-survey row in table u > AND EACH question in table q WHERE u.survey = q.survey > > > DO this process > >Then I'm all ears.
Let me first check my interpretation of the code you posted. If I understand you correctly, the current process is: * For each user * For each survey taken by that user * For each formula that is related to the survey * Do 4 (or 16) times: * Some very complicated things to get a value * Calculate end total based on those 4 (or 16) values) The key to removing the cursor would be to turn the process logic upside down: * Do 4 (or 16 times): * Some very complicated things to get value for each combination of user, survey and formula and store them in a temp table * Calculate end totals based on temp table with those 4 (or 16) values for each combination of user, survey and formula Now that's the theory. The practice will be harder. Making this change to the partial code you posted looks to be a daunting task. Considering that you left out some (probably complicated) subqueries and join clauses and that you didn't post the text of the stored procedures and functions used in the code, you'll have to ask yourself if the amount of energy and time you'll have to invest is worth the gain of removing the cursor. The answer to that question should mainly depend on how important the performance of this process is in your application. If it's a 2-hour job that is run during regula 12-hour downtime, don't bother. But if it's a 15-minute job that regularly keeps hundreds of users waiting for the system to respond, you might gain a lot by rewriting this. (snip) [quoted text, click to view] >So.... since the last post, I've actually completed refining all of the code, and significantly simplifying it, and of course, this last 'quirk' is the only thing I can't work out, but it's much less intrusive now then it was in the original code. And now, because it's significantly shorter, and simpler, I can post a blinded source for the whole process here, which might help even more. > >Frankly, now, the issue seems trivial in comparision of what used to be. :-)
I thought I had already answered your original question, but looking back in the thread I see I forgot that. I was too busy trying to find a solution for your problem that I forgot to answer the question. ;-/ The answer is that it's not possible to make the left part of an assignment variable; you're stuck with IF @i = 1 SET @AN1 = @tval IF @i = 2 SET @AN2 = @tval IF @i = 3 SET @AD1 = @tval IF @i = 4 SET @AD2 = @tval Finally some comments on your posted code. [quoted text, click to view] >CREATE TABLE #temp_users( > userid nvarchar(100), > surveyid int >) >INSERT INTO #temp_users >SELECT DISTINCT u.userid, s.surveyid FROM users u >WHERE u.entrydate > @LastUpdate
Where does s.surveyid come from? And is DISTINCT really needed? Not only can DISTINCT be a slow operation, it also often indicates a design flaw. [quoted text, click to view] >CREATE TABLE temp_answers ( > surveyid int, > userid varchar(40), > questionnumber varchar(50), > answer money, > ) >INSERT INTO temp_answers >SELECT a.surveyid, a.Userid, a.questionnumber, usp_scrubdata(a.answer) >FROM answers a >INNER JOIN #temp_users t > ON a.surveyid = t.surveyid AND a.userid = t.userid
What's up with the datatypes? userid changing from nvarchar(100) in the other table to varchar(100) in this one? A question NUMBER stored as varchar(50)? And money as datatype for a survey answer? These all look subject to me. [quoted text, click to view] >DECLARE user_cursor CURSOR READ_ONLY >FOR >SELECT DISTINCT u.userid FROM #temp_users u > >DECLARE survey_cursor CURSOR READ_ONLY >FOR >SELECT DISTINCT u.surveyid FROM #temp_users u (...) > DECLARE formula_cursor CURSOR READ_ONLY > FOR > SELECT DISTINCT formulaid FROM formulas > WHERE surveyid = @surveyid
Based on the code you posted, you can replace the three nested cursors with one single cursor: DECLARE the_cursor CURSOR FAST_FORWARD READ_ONLY FOR SELECT u.userid, u.surveyid, f.formulaid FROM #temp_users AS u INNER JOIN formulas AS f ON f.surveyid = u.surveyid ORDER BY u.userid, u.surveyid, f.formulaid Only include the ORDER BY if you need to do extra logic at the start or end of a survey or at the start or end of a user, since the query might run faster without it. (You didn't post any such logic in your code, but maybe you omitted it). [quoted text, click to view] > SET @nunits = (SELECT nunits FROM formulas WHERE formulaid = @formulaid) > SET @dunits = (SELECT nunits FROM formulas WHERE formulaid = @formulaid)
Add f.nunits and f.dunits to the cursor above and add @nunits and @dunits to the FETCH statement, and you can get rid of these two queries and gain some performance.
[quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:6fkup11gcblilg3rfolkj44jhbpknrf1sb@4ax.com...
Thanks for the excellent feedback, Hugo. See responses inline... [quoted text, click to view] > Let me first check my interpretation of the code you posted. If I > understand you correctly, the current process is: > > * For each user > * For each survey taken by that user > * For each formula that is related to the survey > * Do 4 (or 16) times: > * Some very complicated things to get a value > * Calculate end total based on those 4 (or 16) values)
That's it. [quoted text, click to view] > The key to removing the cursor would be to turn the process logic upside > down: > > * Do 4 (or 16 times): > * Some very complicated things to get value for each combination of > user, survey and formula and store them in a temp table > * Calculate end totals based on temp table with those 4 (or 16) values > for each combination of user, survey and formula > > Now that's the theory. The practice will be harder.
Yeah.... that's what I thought. I'm not into redesigning the proc that much. :-) [quoted text, click to view] > Making this change to the partial code you posted looks to be a daunting > task. Considering that you left out some (probably complicated) subqueries > and join > clauses and that you didn't post the text of the stored procedures and > functions used in the code,
Actually the last set of code I posted was pretty complete as to that section of code. Obviously I didn't post the setup code, and I didn't post the computational algorithm (which is nothing more than the sum of numerators / sum of denominators), the code I did post is pretty complete as to the algorithm and syntax. [quoted text, click to view] > you'll have to ask yourself if the amount of > energy and time you'll have to invest is worth the gain of removing the > cursor. The answer to that question should mainly depend on how > important the performance of this process is in your application. If > it's a 2-hour job that is run during regula 12-hour downtime, don't > bother. But if it's a 15-minute job that regularly keeps hundreds of > users waiting for the system to respond, you might gain a lot by > rewriting this.
Naw.. it's a 5 minute job that runs once a day. My primary assignment was to learn the algorithm and process, so while I was doing that, it seemed appropriate to 'enhance' the existing code. I'll be the one maintaining it from here on, so being functional to me is the primary motivator. A secondary motivator was trying to reduce the processing time. I found one bottleneck in the code itself, in the way we built the work tables. The second bottleneck is actually in code not included in this question. I'm still working on that 'fix'. [quoted text, click to view] > The answer is that it's not possible to make the left part of an > assignment variable; you're stuck with
Thanks. I'd pretty much come to that conclusion over the past couple of weeks, but having a second pair of eyes to confirm my 'lack of discovery of alternatives' is a good thing. [quoted text, click to view] > Finally some comments on your posted code. > >>CREATE TABLE #temp_users( >> userid nvarchar(100), >> surveyid int >>) >>INSERT INTO #temp_users >>SELECT DISTINCT u.userid, s.surveyid FROM users u >>WHERE u.entrydate > @LastUpdate > > Where does s.surveyid come from?
Actually it was an error on my part in posting the code. I should have changed the 's' to a 'u'. I simplified the query for the posting, but the actual query selects that user-survey list based on a bit more complex criteria than the simple list of userid and surveyid. Again, the complexity was not relevant to demonstrating the scenario or the specific question, so I simplified the query for demonstration purposes. [quoted text, click to view] > And is DISTINCT really needed? Not only can DISTINCT be a slow > operation, it also often indicates a design flaw.
In the above simplified query, it is definitely not needed, and I'll look into the full query, as it may not actually be needed there either. [quoted text, click to view] >>CREATE TABLE temp_answers ( >> surveyid int, >> userid varchar(40), >> questionnumber varchar(50), >> answer money, >> ) >>INSERT INTO temp_answers >>SELECT a.surveyid, a.Userid, a.questionnumber, usp_scrubdata(a.answer) >>FROM answers a >>INNER JOIN #temp_users t >> ON a.surveyid = t.surveyid AND a.userid = t.userid > > What's up with the datatypes? userid changing from nvarchar(100) in the > other table to varchar(100) in this one?
Yeah.. like I said.. I inherited the code, and the code has been "patched" over time during the past two years. There's lots of 'cleanup' and 'optimization' still waiting to be done. [quoted text, click to view] > A question NUMBER stored as varchar(50)?
The question identifier is a variable length text label, so in this case, it's the correct datatype. [quoted text, click to view] > And money as datatype for a survey answer? These all look subject to me.
Not if the answers are currency amounts. :-) [quoted text, click to view] > Based on the code you posted, you can replace the three nested cursors > with one single cursor: > > DECLARE the_cursor CURSOR FAST_FORWARD READ_ONLY > FOR > SELECT u.userid, u.surveyid, f.formulaid > FROM #temp_users AS u > INNER JOIN formulas AS f > ON f.surveyid = u.surveyid > ORDER BY u.userid, u.surveyid, f.formulaid
This I like!!! I don't have a lot of experience working with cursors, so I hadn't even thought of collapsing the nested cursors into a single cursor. I'll take a hard look at this today. Definitely reducing the nested cursor loop should help a lot. A fair amount of time is being lost just building the cursor, as I've seen tracing through the code in debug mode. [quoted text, click to view] > Only include the ORDER BY if you need to do extra logic at the start or > end of a survey or at the start or end of a user, since the query might > run faster without it. (You didn't post any such logic in your code, but > maybe you omitted it).
Thanks for the tip. I don't believe there's anything in the code that is order dependent, and the results are all written to another perm table, so if they are, they can be ordered when the data is extracted from that results table. [quoted text, click to view] >> SET @nunits = (SELECT nunits FROM formulas WHERE formulaid = >> @formulaid) >> SET @dunits = (SELECT nunits FROM formulas WHERE formulaid = >> @formulaid) > > Add f.nunits and f.dunits to the cursor above and add @nunits and > @dunits to the FETCH statement, and you can get rid of these two queries > and gain some performance.
Thanks, again. [quoted text, click to view] >> SET @ndterm = ( >> CASE @i >> WHEN 1 THEN (SELECT N1 FROM formulas WHERE formulaid = >> @formulaid) >> WHEN 2 THEN (SELECT N2 FROM formulas WHERE formulaid = >> @formulaid) >> WHEN 3 THEN (SELECT N3 FROM formulas WHERE formulaid = >> @formulaid)
[quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:6fkup11gcblilg3rfolkj44jhbpknrf1sb@4ax.com... >>DECLARE user_cursor CURSOR READ_ONLY >>FOR >>SELECT DISTINCT u.userid FROM #temp_users u >> >>DECLARE survey_cursor CURSOR READ_ONLY >>FOR >>SELECT DISTINCT u.surveyid FROM #temp_users u > (...) >> DECLARE formula_cursor CURSOR READ_ONLY >> FOR >> SELECT DISTINCT formulaid FROM formulas >> WHERE surveyid = @surveyid > > Based on the code you posted, you can replace the three nested cursors > with one single cursor: > > DECLARE the_cursor CURSOR FAST_FORWARD READ_ONLY > FOR > SELECT u.userid, u.surveyid, f.formulaid > FROM #temp_users AS u > INNER JOIN formulas AS f > ON f.surveyid = u.surveyid
I have a question on the collapsing of these cursors. I looked at the user_cursor/survey_cursor scenario and immediately saw a ten fold reduction in the processing, because each user usually only has one (or maybe two or three in rare circumstances) survey to be processed at at time, so combining the user and survey cursors reduces our processing loop from X * Y loops to ~X + N loops, where N represents those users with more than one survey result. However, on combining the formula cursor, my question relates to the cost of adding a couple of dozen fields to the cursor, as well as the Y * Z extra rows in the cursor, since each survey contains a somewhat unique set of formulas (there is crossover, some formulas appear on multiple surveys, but that occurrence is not reasonably predictable in this processing activity). So, building the formula cursor does result in a relatively unique cursor set for each survey processed. Is the cost of adding the columns and rows to the main cursor more or less than the cost of building and tearing down the formula cursor for each loop of the user-survey cursor set. My gut tells me the cost of building and tearing down the cursor is more (as would be the same with building and tearing down objects in .NET code), but, as noted, I'm a cursor neophyte -- having generally avoided them to date.
[quoted text, click to view] On Wed, 14 Dec 2005 09:29:48 -0600, Lawrence Garvin wrote: > >"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message >news:6fkup11gcblilg3rfolkj44jhbpknrf1sb@4ax.com... > >Thanks for the excellent feedback, Hugo. See responses inline...
Hi Lawrence, You're welcome. I've snipped most of your post, save the few things I'd like to add some comments to. [quoted text, click to view] > The >second bottleneck is actually in code not included in this question. I'm >still working on that 'fix'.
Let me know if you need help there too. (Best to post in a new thread, in that case, so that other regulars will look into it as well) [quoted text, click to view] >> Based on the code you posted, you can replace the three nested cursors >> with one single cursor: >> >> DECLARE the_cursor CURSOR FAST_FORWARD READ_ONLY >> FOR >> SELECT u.userid, u.surveyid, f.formulaid >> FROM #temp_users AS u >> INNER JOIN formulas AS f >> ON f.surveyid = u.surveyid >> ORDER BY u.userid, u.surveyid, f.formulaid > >This I like!!! I don't have a lot of experience working with cursors, so I >hadn't even thought of collapsing the nested cursors into a single cursor. >I'll take a hard look at this today. Definitely reducing the nested cursor >loop should help a lot. A fair amount of time is being lost just building >the cursor, as I've seen tracing through the code in debug mode.
I haven't much experience with cursors either, as I tend to do my very best to avoid them whenever possible. As a rule of thumb, try to get as much work as possible done in as little as possible queries - they'll get more complicated, but they'll usually perform faster. (And the TOTAL code often gets shorter and less complicated). [quoted text, click to view] >> Only include the ORDER BY if you need to do extra logic at the start or >> end of a survey or at the start or end of a user, since the query might >> run faster without it. (You didn't post any such logic in your code, but >> maybe you omitted it). > >Thanks for the tip. I don't believe there's anything in the code that is >order dependent, and the results are all written to another perm table, so >if they are, they can be ordered when the data is extracted from that >results table.
So omit the ORDER BY from the cursor and get some additional performance gain. [quoted text, click to view] >> A further performance gain would be to add N1 through N4 to the cursor >> definition as well. > >Or N1 through N8 and D1 through D8 in the case of the full package. Is there >any issue I need to be aware of in terms of the number of fields maintained >in the cursor? (Guess it's time to go do some cursor studying!)
There might be some limit, but expect it to be much higher than you'll ever run into. Best, Hugo --
[quoted text, click to view] On Wed, 14 Dec 2005 10:27:03 -0600, Lawrence Garvin wrote: > >"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message >news:6fkup11gcblilg3rfolkj44jhbpknrf1sb@4ax.com... > >>>DECLARE user_cursor CURSOR READ_ONLY >>>FOR >>>SELECT DISTINCT u.userid FROM #temp_users u >>> >>>DECLARE survey_cursor CURSOR READ_ONLY >>>FOR >>>SELECT DISTINCT u.surveyid FROM #temp_users u >> (...) >>> DECLARE formula_cursor CURSOR READ_ONLY >>> FOR >>> SELECT DISTINCT formulaid FROM formulas >>> WHERE surveyid = @surveyid >> >> Based on the code you posted, you can replace the three nested cursors >> with one single cursor: >> >> DECLARE the_cursor CURSOR FAST_FORWARD READ_ONLY >> FOR >> SELECT u.userid, u.surveyid, f.formulaid >> FROM #temp_users AS u >> INNER JOIN formulas AS f >> ON f.surveyid = u.surveyid > >I have a question on the collapsing of these cursors. I looked at the >user_cursor/survey_cursor scenario and immediately saw a ten fold reduction >in the processing, because each user usually only has one (or maybe two or >three in rare circumstances) survey to be processed at at time, so combining >the user and survey cursors reduces our processing loop from X * Y loops to >~X + N loops, where N represents those users with more than one survey >result. > >However, on combining the formula cursor, my question relates to the cost of >adding a couple of dozen fields to the cursor, as well as the Y * Z extra >rows in the cursor, since each survey contains a somewhat unique set of >formulas (there is crossover, some formulas appear on multiple surveys, but >that occurrence is not reasonably predictable in this processing activity). > >So, building the formula cursor does result in a relatively unique cursor >set for each survey processed. Is the cost of adding the columns and rows to >the main cursor more or less than the cost of building and tearing down the >formula cursor for each loop of the user-survey cursor set. > >My gut tells me the cost of building and tearing down the cursor is more (as >would be the same with building and tearing down objects in .NET code), but, >as noted, I'm a cursor neophyte -- having generally avoided them to date.
Hi Lawrence, And as noted, so am I. :-) I'm not sure if I understand your question correctly. I *think* that you worry about a relatively low number of rows in the Formulas table that are each linked to many rows in the #temp_users table. Correct? Let's say, for example, that #temp_users has 10,000 rows, formulas has 10 rows and an average survey uses 3 of them. If you keep the formula_cursor as a seperate cursor, you'll execute the main loop 10,000 times. Each of these 10,000 executions involve setting up a cursor, fetching an average of 3 rows from it and deallocating the cursor. Since the formulas table is small and the surveyid column is not very selective, you'll have 10,000 table scans, plus the additional overhead of building and tearing down 10,000 cursors. If you use one single cursor, the optimizer can pick different execution plans. Worst case would be 10,000 table scans on the formula table. Best case would be one single table scan. Since even the worst case is no different from the nested cursor approach (but does remove much of the overhead), you can only win. This is theory, of course. To find out for sure, someone with access to the actual tables and the actual data would have to test both versions. In SQL Server optimization issues, testing is always the only way to really find out which version is faster. Best, Hugo --
[quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:kd39q1pqm5fo6qbrn7hcp6rpm12r65t47j@4ax.com...
[quoted text, click to view] > I'm not sure if I understand your question correctly. I *think* that you > worry about a relatively low number of rows in the Formulas table that > are each linked to many rows in the #temp_users table. Correct?
Actually, it's the other way around. The new user-survey cursor has a few dozen rows on any given run of the proc. The formulas cursor could have a couple of dozen rows per survey. The number of formulas is survey dependent. [quoted text, click to view] > Let's say, for example, that #temp_users has 10,000 rows, formulas has > 10 rows and an average survey uses 3 of them.
To help with the practical considerations... #temp_users might have 50 rows, formulas does have 1500+ rows, and, as noted above, the average survey uses a couple of dozen formulas per survey. [quoted text, click to view] > If you keep the formula_cursor as a seperate cursor, you'll execute the > main loop 10,000 times. Each of these 10,000 executions involve setting > up a cursor, fetching an average of 3 rows from it and deallocating the > cursor. Since the formulas table is small and the surveyid column is not > very selective, you'll have 10,000 table scans, plus the additional > overhead of building and tearing down 10,000 cursors.
This makes sense to me, and if this had been the case, it would have been a no brainer. :-) But the current scenario involves taking what is a 50 row cursor x 50 executions of a 25 row cursor, and converting that into a 1250 row cursor. There aren't any table scans involved in building the cursors because all of these source tables have unique-id clustered indexes that are being used to build the cursor. [quoted text, click to view] > If you use one single cursor, the optimizer can pick different execution > plans. Worst case would be 10,000 table scans on the formula table. Best > case would be one single table scan. Since even the worst case is no > different from the nested cursor approach (but does remove much of the > overhead), you can only win. > > This is theory, of course. To find out for sure, someone with access to > the actual tables and the actual data would have to test both versions. > In SQL Server optimization issues, testing is always the only way to > really find out which version is faster.
Which is what I'll probably look into after the holidays. We're going live with the code I have...today. The big bottleneck I mentioned previously is not 'fixable'. It's a full table scan of a half-million row table to scrub data that can be changed at any time, via an ASP.NET front-end that's not very rich in data validation. I'm looking into ways to get that table scan data scrub out of my proc code entirely. <g> Thanks for the insight and help! [quoted text, click to view] > Best, Hugo > -- > > (Remove _NO_ and _SPAM_ to get my e-mail address)
[quoted text, click to view] On Mon, 19 Dec 2005 09:16:46 -0600, Lawrence Garvin wrote: > >"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message >news:kd39q1pqm5fo6qbrn7hcp6rpm12r65t47j@4ax.com... > > >> I'm not sure if I understand your question correctly. I *think* that you >> worry about a relatively low number of rows in the Formulas table that >> are each linked to many rows in the #temp_users table. Correct? > >Actually, it's the other way around. The new user-survey cursor has a few >dozen rows on any given run of the proc. The formulas cursor could have a >couple of dozen rows per survey. The number of formulas is survey >dependent. > >> Let's say, for example, that #temp_users has 10,000 rows, formulas has >> 10 rows and an average survey uses 3 of them. > >To help with the practical considerations... #temp_users might have 50 rows, >formulas does have 1500+ rows, and, as noted above, the average survey uses >a couple of dozen formulas per survey. > >> If you keep the formula_cursor as a seperate cursor, you'll execute the >> main loop 10,000 times. Each of these 10,000 executions involve setting >> up a cursor, fetching an average of 3 rows from it and deallocating the >> cursor. Since the formulas table is small and the surveyid column is not >> very selective, you'll have 10,000 table scans, plus the additional >> overhead of building and tearing down 10,000 cursors. > >This makes sense to me, and if this had been the case, it would have been a >no brainer. :-) > >But the current scenario involves taking what is a 50 row cursor x 50 >executions of a 25 row cursor, and converting that into a 1250 row cursor.
Hi Lawrence, 50 executions of a 25 row cursor makes a total of 50 x 25 = 1250 rows. One single 1250 row cursor is 1250 rows as well, but with less overhead. And the possibility of a more efficient plan. [quoted text, click to view] >There aren't any table scans involved in building the cursors because all of >these source tables have unique-id clustered indexes that are being used to >build the cursor.
There's no WHERE clause on the cursors that iterate the #temp_users table, so that one definitely needs to be scanned (though it's called a clustered index scan instead of a table scan if there's a clustered index - but what's in a name?) For the innermost cursor that reads the formulas table, the WHERE clause limits the rows to a specific value of surveyid. If surveyid is the FIRST column in the clustered index, you'll get a clustered index seek; otherwise it'll be a clustered index scan. In the nested cursors version, you'll have an absolute minimum of one data page read for each execution of the innermost cursor. That results in a total of at least 50 page reads. In the single curosr version, the optimizer can consider different plans. For instance, depending on the total number of rows in the formulas table, the total number of data pages in this table could well be less than 50; in that case, one single table scan would be faster than 50 consecutive clustered index seeks. (Using the figures you gave above: 1500+ rows for the formulas table, a row size of [checking table structure in previous post - oh, I think that was not the actual table structure, was it? - well, let's just make a wild guess] 200 bytes, that's a grand total of 1500 x 200 = 300,000 bytes; at 8K per data page, this all fits on less than 40 data pages.) Have you already tried both alternatives and compared results? [quoted text, click to view] > >> If you use one single cursor, the optimizer can pick different execution >> plans. Worst case would be 10,000 table scans on the formula table. Best >> case would be one single table scan. Since even the worst case is no >> different from the nested cursor approach (but does remove much of the >> overhead), you can only win. >> >> This is theory, of course. To find out for sure, someone with access to >> the actual tables and the actual data would have to test both versions. >> In SQL Server optimization issues, testing is always the only way to >> really find out which version is faster. > >Which is what I'll probably look into after the holidays. We're going live >with the code I have...today. The big bottleneck I mentioned previously is >not 'fixable'. It's a full table scan of a half-million row table to scrub >data that can be changed at any time, via an ASP.NET front-end that's not >very rich in data validation. I'm looking into ways to get that table scan >data scrub out of my proc code entirely. <g>
Are you constantly scrubbing the same half-million rows? Or are there a half-million new and/or changed rows between two executions of the job that does the scrubbing? In the latter case, there's not much you can do. Scrubbing raw data requires you to access each row; a table scan is actually the fastest option in that case. If you have so many new or changed rows that need to be scrubbed, than that's what it takes. But if you're checking all existing rows because some of them might have been changed, then there are various alternatives: 1. Put the data from the flakey front-end in a temporary holding table; the scrub job will process this table, scrub the data and move the cleaned data from the temp table to it's permanent location. 2. Add a trigger that scrubs the data the very moment it gets in. Best, Hugo --
[quoted text, click to view] "Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message news:k57mq156emr1pi3ab7hiblhbuvhrus5r6c@4ax.com... >>The big bottleneck I mentioned previously is >>not 'fixable'. It's a full table scan of a half-million row table to scrub >>data that can be changed at any time, via an ASP.NET front-end that's not >>very rich in data validation. I'm looking into ways to get that table scan >>data scrub out of my proc code entirely. <g> > > Are you constantly scrubbing the same half-million rows?
Sadly... yes.... [quoted text, click to view] > Or are there a > half-million new and/or changed rows between two executions of the job > that does the scrubbing?
The table contains two types of rows. The current data, and a historical data row. As the data is changed in a row, the original row has a bit value set, and a new row is inserted. I'm lobbying on an architecture change that will store the historical data in a separate table, but I've also researched and found that the historical rows currently only comprise about 40% of the data, so even with that accomplished, the table is still 300,000 rows, and will continue to grow as more surveys are completed and more surveys are added to the survey pool. The issue is that any one of those 300,000 'live' rows could be changed at any time, though there's a relative likelihood correlated inversely with the age of the data. [quoted text, click to view] > In the latter case, there's not much you can do. Scrubbing raw data > requires you to access each row; a table scan is actually the fastest > option in that case. If you have so many new or changed rows that need > to be scrubbed, than that's what it takes.
I'm looking at a possible architectural implementation that will create a flag value correlated to the survey response that will enumerate simply whether the answer is zero, null, or otherwise. Almost all of our table scans on this table, and particularly the one in this proc, do a scan for zero or null answers. With a bit field marked as zero, null, or one, based on the answer, we can index on the bit field, and then do the select where the bit field <> 1, which will give us all zero and null responses. [quoted text, click to view] > But if you're checking all existing rows because some of them might have > been changed, then there are various alternatives:
In fact, because a /few/ may have been changed since the last run of this proc -- which actually runs nightly. [quoted text, click to view] > 1. Put the data from the flakey front-end in a temporary holding table; > the scrub job will process this table, scrub the data and move the > cleaned data from the temp table to it's permanent location.
A very reasonable solution based on what I know about the current implementation. [quoted text, click to view] > 2. Add a trigger that scrubs the data the very moment it gets in.
Hmmmmm.... now there's an idea also. I'll pursue both... starting with #2, and then rolling back to #1 if I can't get my idea sold to my super. Thanks Hugo! [quoted text, click to view] > Best, Hugo > -- > > (Remove _NO_ and _SPAM_ to get my e-mail address)
Don't see what you're looking for? Try a search.
|
|
|