Groups | Blog | Home
all groups > sql server new users > november 2005 >

sql server new users : Code to assign variable in loop


Lawrence Garvin
11/28/2005 11:59:05 AM
I'm trying to devise syntax to assign a value to a variable from a field in
the table, where the variable is a different value each time through the
loop. Obviously this would be an array solution in a normal coding
environment. Is there anything I can do in T-Sql to emulate this?

This process has a cursor wrapped around it that scans through another
table. This table is a lookup table from the other table.

--DDL--
CREATE TABLE strings (
recid int,
fld1 varchar(50),
fld2 varchar(50),
fld3 varchar(50),
fld4 varchar(50)
)
INSERT INTO (recid, fld1, fld2, fld3, fld4)
SELECT 1, 'String1a', 'String1b', 'String1c', 'String1d' UNION ALL
SELECT 2, 'String2a', 'String2b', 'String2c', 'String2d' UNION ALL
SELECT 3, 'String3a', 'String3b', 'String3c', 'String3d' UNION ALL
SELECT 4, 'String4a', 'String4b', 'String4c', 'String4d'

--DML--
DECLARE @i int
DECLARE @idval int
DECLARE @invar varchar(50)
DECLARE @A1 varchar(50)
DECLARE @A2 varchar(50)
DECLARE @A3 varchar(50)
DECLARE @A4 varchar(50)

SET @i = 1
SET @idval = (SELECT foreignkey FROM another_table WHERE criteria =
what_is_wanted)

WHILE @i < 5
BEGIN
SET @invar = (
CASE @i
WHEN 1 (SELECT fld1 FROM strings WHERE recid = @idval
WHEN 2 (SELECT fld2 FROM strings WHERE recid = @idval
WHEN 3 (SELECT fld3 FROM strings WHERE recid = @idval
WHEN 4 (SELECT fld4 FROM strings WHERE recid = @idval
END
)

-- massage and manipulate the value loaded into @invar to prepare it to
be written to @A1, @A2, @A3, or @A4


-- the intent is that @A1 = fld1; @A2 = fld2; @A3 = fld3; @A4 = fld4
after the data manipulation is performed.
-- how can I choose the correct variable to store @invar based on the
iteration of @i ????

SET @i = @i + 1
END

Hugo Kornelis
12/1/2005 12:36:01 AM
[quoted text, click to view]
(snip)

Hi Lawrence,

Why not get rid of the loop and instead use something like this:

UPDATE strings
SET fld1 = some_expression(field_1),
fld2 = some_expression(field_2),
fld3 = some_expression(field_3),
fld4 = some_expression(field_4),
fld5 = some_expression(field_5)
WHERE recid = @idval

??

Or, even better, why not normalise your design?

If these suggestions don't work, then you'll have to provide more
information: sample data (as INSERT statements) and expected results
would be great. The part of the code you replaced with a comment would
help as well.

Read more about providing accurate specs on www.aspfaq.com/5006.

Best, Hugo
--

Lawrence Garvin
12/1/2005 9:59:29 AM
[quoted text, click to view]


Hugo, thank you for your responses....

[quoted text, click to view]

The 'loop' was an attempt at consolidating what is currently a very linear
structure. I'm certainly open to options.

[quoted text, click to view]

But, unfortunately, the purpose of the logic is not to update the database,
but rather to extract the field from the database, manipulate it, extract
its corresponding value from a values table, and then use the value in a
computation.

[quoted text, click to view]

Actually, it is normalized, but my simplistic example likely suggests it is
not. The actual table is a formula table, where each column contains a term
of a formula with a variable number of terms -- up to 8 numerator terms, up
to 8 denominator terms, and the operator between the terms can either be
addition or subtraction, all of which is stored in the table, one row per
formula.

My example was simplistic because I really wanted to focus on processing
through the up-to-sixteen columns in each row and getting them assigned to
the local vars.

[quoted text, click to view]

Here's the detailed view of what I'm working with. I apologize that you feel
my specs were not accurate enough. In fact, after having read over a month's
worth of messages in this newsgroup, I spent a considerable amount of time
formulating the specs in my question so that I could focus specifically on
the question at hand -- which is how to populate a list of local vars with
values from a selection of columns in a single row -- without cluttering the
post, or the group, with lots of "useless" details. I've still compressed
the schema presented here to only four terms total, but we allow for eight
numerator and eight denominator in the table structure. Four should be
sufficient to establish the question, don't you think?

The problem is that it's not practical to post the several thousand lines of
setup and processing code that wrap around this basic question, so I'm doing
my best to include what makes the question functional in the context in
which it is being used.

I have /inherited/ this database structure and it's a production database,
so I'm quite limited in schema modifications, but if you have a suggestion
for a schema mod that would make a /significant/ difference, I'm certainly
listening.

Here we go:

CREATE TABLE formulas (
surveyid as int, --contains unique identifier for survey
ncount as int, --contains count of actual number of numerator terms in
formula
dcount as int, --contains count of actual number of denominator terms
in formula
n1 as varchar(50),
n2 as varchar(50),
d1 as varchar(50),
d2 as varchar(50),
)
-- the data contained in n1, n2, d1, d2, is a foreign key into another table
that contains the actual /values/ used in the formula.
-- the value is the actual name of the specific question that provides the
data for the formula
-- for purposes of this example, we're assuming the mathematical operator is
always addition
-- note the foreign key relationship is implicit -- it is not declared in
the table structure
-- and no referential integrity is imposed upon the table relationships
-- (as noted, I /inherited/ this schema -- and my role is to maintain and
enhance the production environment)

CREATE TABLE answers (
surveyid as int,
qname as varchar(50),
answer as int
)

INSERT INTO formulas
SELECT 2,1,'Q1_A', 'Q1_B','Q1_TOTAL',NULL UNION ALL
SELECT 2,2,'Q2_C','Q2_D','Q2_E',Q2_TOTAL'

INSERT INTO answers
SELECT 'Q1_A',50 UNION ALL
SELECT 'Q1_B',100 UNION ALL
SELECT 'Q1_TOTAL',150 UNION ALL
SELECT 'Q2_C',25 UNION ALL
SELECT 'Q2_D',25 UNION ALL
SELECT 'Q2_E',75 UNION ALL
SELECT 'Q2_TOTAL',125

-- The purpose of the procedure is to extract the question name, determine
if it is NULL, a _TOTAL question
-- (which needs to invoke a subcomputation to validate the value stored), or
a raw data value. Then, the /value/
-- is retrieved from the data values table, and assigned to a local variable
based on the term position.

-- So the -end result- that is desired is:
DECLARE @workvar varchar(50)
DECLARE @retval int
DECLARE @n1 varchar(50)
DECLARE @n2 varchar(50)
DECLARE @d1 varchar(50)
DECLARE @d2 varchar(50)

DECLARE @surveyid int
-- this value is actually set by a cursor that scans through a temp table
containing the list of surveys to be processed
-- I see no point in replicating the cursor setup and processing loop to
establish my question
-- so let's assume, for purposes of this question, that we're only
processing one survey and it's Survey #1
SET @surveyid = 1


SET @workvar = (SELECT n1 FROM formulas f WHERE f.surveyid = @surveyid)
EXEC usp_storedproc @workvar @retval
SET @n1 = @retval

SET @workvar = (SELECT n2 FROM formulas f WHERE f.surveyid = @surveyid)
EXEC usp_storedproc @workvar @retval
SET @n2 = @retval

SET @workvar = (SELECT d1 FROM formulas f WHERE f.surveyid = @surveyid)
EXEC usp_storedproc @workvar @retval
SET @d1 = @retval

SET @workvar = (SELECT d2 FROM formulas f WHERE f.surveyid = @surveyid)
EXEC usp_storedproc @workvar @retval
SET @d2 = @retval

-- and then the formula is mathematically computed
DECLARE @answer int
SET @answer = (@n1 + @n2)/(@d1 + @d2)

-- The objective is to find a way in which this process does not need to be
done linearly.
-- It doesn't seem too bad with only four terms, but with sixteen terms,
-- it gets excessively long when processed as a linear set of code.

-- My objective is to compress that into a LOOP for each of the sixteen
terms, since the processing for
-- each term is identical.

-- The /problem/ is dealing with assigning each of the sixteen terms to a
Lawrence Garvin
12/5/2005 6:40:45 PM
Thank you Hugo.

I understand about priorities, and any help you can offer when you can get
to it will be appreciated. In the meantime, that project has a few thousand
lines of code in other sections, so I'm not in an immediate hurry. I'll
probably be "testing" my "enhancements" through the end of next week, at a
minimum.


[quoted text, click to view]

Hugo Kornelis
12/6/2005 1:19:45 AM
[quoted text, click to view]

(snip)

Hi Lawrence,

I have to apologize for not replying sooner. And I'll also have to task
your patience even more. Real life is terribly busy at the moment. I
still have your reply marked as "unread"; I'll study it in more detail
and reply when I have more time (hopefully in a few days).

Best, Hugo
--

Hugo Kornelis
12/6/2005 11:30:37 PM
[quoted text, click to view]

Hi Lawrence,

After reading your message, I think you're barking up the wrong tree (if
that's the correct proverb in this context). You're trying to change
code that is linear but easy to read and follow with code that's harder
to understand, but still performs linear. The code might look prettier
to you (that's a matter of taste, after all), but you've done nothing to
minimize DB access. That should be your prime target.

I'll give you a more efficient replacement code for the code you posted,
and some comments after that. But first a small correction:

[quoted text, click to view]

If you expect usp_storedproc to change the value of @retval, than you'll
have to include the OUTPUT option in both the procedure's definition and
in the call:

[quoted text, click to view]

Anyway - here's how I would rewrite the example you posted:

DECLARE @n1W varchar(50)
DECLARE @n2W varchar(50)
DECLARE @d1W varchar(50)
DECLARE @d2W varchar(50)
DECLARE @n1 varchar(50)
DECLARE @n2 varchar(50)
DECLARE @d1 varchar(50)
DECLARE @d2 varchar(50)

DECLARE @surveyid int
SET @surveyid = 1

SELECT @n1W = n1, @n2W = n2, @d1W = d1, @d2W = d2
FROM formulas
WHERE surveyid = @surveyid

EXEC usp_storedproc @n1W @n1 OUTPUT
EXEC usp_storedproc @n2W @n2 OUTPUT
EXEC usp_storedproc @d1W @d1 OUTPUT
EXEC usp_storedproc @d2W @d2 OUTPUT

-- and then the formula is mathematically computed
DECLARE @answer int
SET @answer = (@n1 + @n2)/(@d1 + @d2)
-- ****** Are you aware that this concatenates the n1 and n2 strings,
-- ****** concatenates the d1 and d2 strings, then attempts to convert
-- ****** both to numeric before dividing??
-- ****** Are you SURE that's what you want? Are the datatype correct???


Some remaining comments:

1. Still long for 16 variables - but not as long as either of your
versions. And it'll perform (slightly) better.

2. Still 4 (in the full version 16) calls to usp_storedproc. Since you
didn't post the code of this proc, I can't assess if it's possible to
use a completely set-based solution. Depending on what exactly this proc
does, this will likely become a bottle-neck for your performance.

[quoted text, click to view]

3. You're partly right. For the problem itself, the exact cursor code is
not relevant. But if you really want to improve your code, you should
try to get rid of the cursor and rewrite your logic to process all rows
in the temp table at once. Or even take it one step further: get rid of
the temp table as well; replace the temp table with the logic you now
use to populate the temp table.
Replacing the cursor with a set-based operation requires moving the
stored procedure's logic into the query. Since I don't know the stored
procedure, I can't help here.

[quoted text, click to view]

4. I appreciate your efforts to post your question as short and
to-the-point as possible. This can indeed be a judgement call. If you
have a question that involves three columns and you post a CREATE TABLE
statement with 140 columns and all 80 CHECK constraints and 12 CREATE
INDEX statements, you're clearly posting too much :-) But as a rule of
thumb, I'd say: when in doubt, include more details. This holds
especially true for code. If I see 400 lines of code in a post, I'll
just copy and paste it into Query Analyzer, then focus on the part of
the data that (I think) is interesting. That doesn't cost much more time
than doing the same with 40 lines of code.

In this case, you had already decided on a solution ("use cursor to
iterate over the temp table, process formulas one by one, then calculate
results") and asked help for a specific detail of your solution. As you
have seen, posting more information allowed me to post a suggestion that
is not a straight answer to your original question, but that I hope
helps you better.

I've also hinted at what might be even better solutions, but I can;t
help with those, for lack of even more details.

I'm not holding this against you. You couldn't have known (if you had
known, you wouldn't have asked for help). I'm just trying to explain why
I was asking for more details - becuase I wanted to answer the question
behind the question, instead of just answering the question you asked.

Best, Hugo
--

Lawrence Garvin
12/12/2005 10:29:08 AM

[quoted text, click to view]

Well, actually, the code was not easy to read. Furthermore, by building =
the loop, I was able to reduce several thousand lines of T-SQL code =
(which still needed to grow, as the actual code only allowed for a small =
percentage of the permutations of eight numerators and eight =
denominators), to 200 lines which handles all permutations of the 16 =
possible terms of the equation.

[quoted text, click to view]

Yes, this was a typo. The actual code syntax is correct. I overlooked =
the addition of the OUTPUT keyword in my example.

[quoted text, click to view]

If only it were this simple... but this does give me an additional thing =
to look at.


The thing is.... n1, n2, d1, and d2 are /names/ of 'questions' that =
contain answers. So, after obtaining the values of n1, n2, d1, and d2, =
those values then need to be used in another table lookup to get the =
/answers/ to be used in the formula.

So what actually happens is

SELECT @answer-n1 =3D (
SELECT an1 FROM answers a
WHERE a.question =3D (
SELECT n1 FROM questions
)
)

But under certain circumstances -- when a.question represents a computed =
total of individual questions, it must also be validated against the =
stored procedure. So the above code is only executed /if/ the question =
is not a totals question, otherwise we call the stored procedure to =
retrieve the value after validating the associated data that makes up =
the totals value.

All of which is pretty irrelevant to the question.

[quoted text, click to view]



[quoted text, click to view]
correct???

Actually, in the real formula the destination local var is declared as =
decimal, so implicit conversion kicks in.


[quoted text, click to view]

I completely rewrote the stored proc, but the stored proc is not really =
relevant to my question, as the question is how to assign the returned =
value to a local variable.

[quoted text, click to view]

Unfortunately not possible that I can see.. but yes, I'd definitely like =
to get rid of the cursors. The problem is that we have a matrix of =
respondents, surveys, and questions that needs to be dynamically built =
and processed. If you know of a way to build T-SQL logic that says:

FOR EACH respondent-survey row in table u
AND EACH question in table q WHERE u.survey =3D q.survey
=20

DO this process

Then I'm all ears.=20

[quoted text, click to view]

So.... since the last post, I've actually completed refining all of the =
code, and significantly simplifying it, and of course, this last 'quirk' =
is the only thing I can't work out, but it's much less intrusive now =
then it was in the original code. And now, because it's significantly =
shorter, and simpler, I can post a blinded source for the whole process =
here, which might help even more.

Frankly, now, the issue seems trivial in comparision of what used to be. =
:-)



DECLARE @LastUpdate datetime
SET @LastUpdate =3D (SELECT UpdateTime FROM LASTTIMERUN)

CREATE TABLE #temp_users(
userid nvarchar(100),
surveyid int
)
INSERT INTO #temp_users
SELECT DISTINCT u.userid, s.surveyid FROM users u
WHERE u.entrydate > @LastUpdate

-- we build temp-answers because the answers are stored in raw, =
collected, text form,
-- and must be validated and scrubbed to numeric form for use in the =
computations
-- invalid data sets result in aborting the computation for that dataset
CREATE TABLE temp_answers (
surveyid int,
userid varchar(40),
questionnumber varchar(50),
answer money,
)
INSERT INTO temp_answers
SELECT a.surveyid, a.Userid, a.questionnumber, usp_scrubdata(a.answer)
FROM answers a
INNER JOIN #temp_users t
ON a.surveyid =3D t.surveyid AND a.userid =3D t.userid

DECLARE @surveyid int
DECLARE @userid varchar(50)
DECLARE @formulaid int
DECLARE @val money
DECLARE @ndterm varchar(50)
DECLARE @AN1 money
DECLARE @AN2 money
DECLARE @AD1 money
DECLARE @AD2 money
DECLARE @tval money
DECLARE @i int
DECLARE @nunits int
DECLARE @dunits int

DECLARE user_cursor CURSOR READ_ONLY
Hugo Kornelis
12/14/2005 12:34:13 AM
[quoted text, click to view]

Hi Lawrence,

After seeing the partial code you posted, I agree that in this case, it
makes sense to choose for a WHILE that iterates the same code 16 times.

(snip)
[quoted text, click to view]

Let me first check my interpretation of the code you posted. If I
understand you correctly, the current process is:

* For each user
* For each survey taken by that user
* For each formula that is related to the survey
* Do 4 (or 16) times:
* Some very complicated things to get a value
* Calculate end total based on those 4 (or 16) values)

The key to removing the cursor would be to turn the process logic upside
down:

* Do 4 (or 16 times):
* Some very complicated things to get value for each combination of
user, survey and formula and store them in a temp table
* Calculate end totals based on temp table with those 4 (or 16) values
for each combination of user, survey and formula

Now that's the theory. The practice will be harder. Making this change
to the partial code you posted looks to be a daunting task. Considering
that you left out some (probably complicated) subqueries and join
clauses and that you didn't post the text of the stored procedures and
functions used in the code, you'll have to ask yourself if the amount of
energy and time you'll have to invest is worth the gain of removing the
cursor. The answer to that question should mainly depend on how
important the performance of this process is in your application. If
it's a 2-hour job that is run during regula 12-hour downtime, don't
bother. But if it's a 15-minute job that regularly keeps hundreds of
users waiting for the system to respond, you might gain a lot by
rewriting this.

(snip)
[quoted text, click to view]

I thought I had already answered your original question, but looking
back in the thread I see I forgot that. I was too busy trying to find a
solution for your problem that I forgot to answer the question. ;-/

The answer is that it's not possible to make the left part of an
assignment variable; you're stuck with
IF @i = 1
SET @AN1 = @tval
IF @i = 2
SET @AN2 = @tval
IF @i = 3
SET @AD1 = @tval
IF @i = 4
SET @AD2 = @tval

Finally some comments on your posted code.

[quoted text, click to view]

Where does s.surveyid come from?

And is DISTINCT really needed? Not only can DISTINCT be a slow
operation, it also often indicates a design flaw.

[quoted text, click to view]

What's up with the datatypes? userid changing from nvarchar(100) in the
other table to varchar(100) in this one? A question NUMBER stored as
varchar(50)? And money as datatype for a survey answer? These all look
subject to me.

[quoted text, click to view]

Based on the code you posted, you can replace the three nested cursors
with one single cursor:

DECLARE the_cursor CURSOR FAST_FORWARD READ_ONLY
FOR
SELECT u.userid, u.surveyid, f.formulaid
FROM #temp_users AS u
INNER JOIN formulas AS f
ON f.surveyid = u.surveyid
ORDER BY u.userid, u.surveyid, f.formulaid

Only include the ORDER BY if you need to do extra logic at the start or
end of a survey or at the start or end of a user, since the query might
run faster without it. (You didn't post any such logic in your code, but
maybe you omitted it).

[quoted text, click to view]

Add f.nunits and f.dunits to the cursor above and add @nunits and
@dunits to the FETCH statement, and you can get rid of these two queries
and gain some performance.

Lawrence Garvin
12/14/2005 9:29:48 AM

[quoted text, click to view]

Thanks for the excellent feedback, Hugo. See responses inline...

[quoted text, click to view]

That's it.

[quoted text, click to view]

Yeah.... that's what I thought. I'm not into redesigning the proc that much.
:-)

[quoted text, click to view]

Actually the last set of code I posted was pretty complete as to that
section of code. Obviously I didn't post the setup code, and I didn't post
the computational algorithm (which is nothing more than the sum of
numerators / sum of denominators), the code I did post is pretty complete as
to the algorithm and syntax.

[quoted text, click to view]

Naw.. it's a 5 minute job that runs once a day. My primary assignment was to
learn the algorithm and process, so while I was doing that, it seemed
appropriate to 'enhance' the existing code. I'll be the one maintaining it
from here on, so being functional to me is the primary motivator. A
secondary motivator was trying to reduce the processing time. I found one
bottleneck in the code itself, in the way we built the work tables. The
second bottleneck is actually in code not included in this question. I'm
still working on that 'fix'.

[quoted text, click to view]

Thanks. I'd pretty much come to that conclusion over the past couple of
weeks, but having a second pair of eyes to confirm my 'lack of discovery of
alternatives' is a good thing.

[quoted text, click to view]

Actually it was an error on my part in posting the code. I should have
changed the 's' to a 'u'. I simplified the query for the posting, but the
actual query selects that user-survey list based on a bit more complex
criteria than the simple list of userid and surveyid. Again, the complexity
was not relevant to demonstrating the scenario or the specific question, so
I simplified the query for demonstration purposes.

[quoted text, click to view]

In the above simplified query, it is definitely not needed, and I'll look
into the full query, as it may not actually be needed there either.

[quoted text, click to view]

Yeah.. like I said.. I inherited the code, and the code has been "patched"
over time during the past two years. There's lots of 'cleanup' and
'optimization' still waiting to be done.

[quoted text, click to view]

The question identifier is a variable length text label, so in this case,
it's the correct datatype.

[quoted text, click to view]

Not if the answers are currency amounts. :-)

[quoted text, click to view]

This I like!!! I don't have a lot of experience working with cursors, so I
hadn't even thought of collapsing the nested cursors into a single cursor.
I'll take a hard look at this today. Definitely reducing the nested cursor
loop should help a lot. A fair amount of time is being lost just building
the cursor, as I've seen tracing through the code in debug mode.

[quoted text, click to view]

Thanks for the tip. I don't believe there's anything in the code that is
order dependent, and the results are all written to another perm table, so
if they are, they can be ordered when the data is extracted from that
results table.

[quoted text, click to view]

Thanks, again.

[quoted text, click to view]
Lawrence Garvin
12/14/2005 10:27:03 AM

[quoted text, click to view]

I have a question on the collapsing of these cursors. I looked at the
user_cursor/survey_cursor scenario and immediately saw a ten fold reduction
in the processing, because each user usually only has one (or maybe two or
three in rare circumstances) survey to be processed at at time, so combining
the user and survey cursors reduces our processing loop from X * Y loops to
~X + N loops, where N represents those users with more than one survey
result.

However, on combining the formula cursor, my question relates to the cost of
adding a couple of dozen fields to the cursor, as well as the Y * Z extra
rows in the cursor, since each survey contains a somewhat unique set of
formulas (there is crossover, some formulas appear on multiple surveys, but
that occurrence is not reasonably predictable in this processing activity).

So, building the formula cursor does result in a relatively unique cursor
set for each survey processed. Is the cost of adding the columns and rows to
the main cursor more or less than the cost of building and tearing down the
formula cursor for each loop of the user-survey cursor set.

My gut tells me the cost of building and tearing down the cursor is more (as
would be the same with building and tearing down objects in .NET code), but,
as noted, I'm a cursor neophyte -- having generally avoided them to date.


Hugo Kornelis
12/17/2005 11:13:25 PM
[quoted text, click to view]

Hi Lawrence,

You're welcome. I've snipped most of your post, save the few things I'd
like to add some comments to.

[quoted text, click to view]

Let me know if you need help there too.

(Best to post in a new thread, in that case, so that other regulars will
look into it as well)


[quoted text, click to view]

I haven't much experience with cursors either, as I tend to do my very
best to avoid them whenever possible.

As a rule of thumb, try to get as much work as possible done in as
little as possible queries - they'll get more complicated, but they'll
usually perform faster. (And the TOTAL code often gets shorter and less
complicated).

[quoted text, click to view]

So omit the ORDER BY from the cursor and get some additional performance
gain.


[quoted text, click to view]

There might be some limit, but expect it to be much higher than you'll
ever run into.

Best, Hugo
--

Hugo Kornelis
12/17/2005 11:23:54 PM
[quoted text, click to view]

Hi Lawrence,

And as noted, so am I. :-)

I'm not sure if I understand your question correctly. I *think* that you
worry about a relatively low number of rows in the Formulas table that
are each linked to many rows in the #temp_users table. Correct?

Let's say, for example, that #temp_users has 10,000 rows, formulas has
10 rows and an average survey uses 3 of them.

If you keep the formula_cursor as a seperate cursor, you'll execute the
main loop 10,000 times. Each of these 10,000 executions involve setting
up a cursor, fetching an average of 3 rows from it and deallocating the
cursor. Since the formulas table is small and the surveyid column is not
very selective, you'll have 10,000 table scans, plus the additional
overhead of building and tearing down 10,000 cursors.

If you use one single cursor, the optimizer can pick different execution
plans. Worst case would be 10,000 table scans on the formula table. Best
case would be one single table scan. Since even the worst case is no
different from the nested cursor approach (but does remove much of the
overhead), you can only win.

This is theory, of course. To find out for sure, someone with access to
the actual tables and the actual data would have to test both versions.
In SQL Server optimization issues, testing is always the only way to
really find out which version is faster.

Best, Hugo
--

Lawrence Garvin
12/19/2005 9:16:46 AM

[quoted text, click to view]


[quoted text, click to view]

Actually, it's the other way around. The new user-survey cursor has a few
dozen rows on any given run of the proc. The formulas cursor could have a
couple of dozen rows per survey. The number of formulas is survey
dependent.

[quoted text, click to view]

To help with the practical considerations... #temp_users might have 50 rows,
formulas does have 1500+ rows, and, as noted above, the average survey uses
a couple of dozen formulas per survey.

[quoted text, click to view]

This makes sense to me, and if this had been the case, it would have been a
no brainer. :-)

But the current scenario involves taking what is a 50 row cursor x 50
executions of a 25 row cursor, and converting that into a 1250 row cursor.
There aren't any table scans involved in building the cursors because all of
these source tables have unique-id clustered indexes that are being used to
build the cursor.

[quoted text, click to view]

Which is what I'll probably look into after the holidays. We're going live
with the code I have...today. The big bottleneck I mentioned previously is
not 'fixable'. It's a full table scan of a half-million row table to scrub
data that can be changed at any time, via an ASP.NET front-end that's not
very rich in data validation. I'm looking into ways to get that table scan
data scrub out of my proc code entirely. <g>

Thanks for the insight and help!

[quoted text, click to view]

Hugo Kornelis
12/22/2005 10:57:56 PM
[quoted text, click to view]

Hi Lawrence,

50 executions of a 25 row cursor makes a total of 50 x 25 = 1250 rows.
One single 1250 row cursor is 1250 rows as well, but with less overhead.
And the possibility of a more efficient plan.

[quoted text, click to view]

There's no WHERE clause on the cursors that iterate the #temp_users
table, so that one definitely needs to be scanned (though it's called a
clustered index scan instead of a table scan if there's a clustered
index - but what's in a name?)

For the innermost cursor that reads the formulas table, the WHERE clause
limits the rows to a specific value of surveyid. If surveyid is the
FIRST column in the clustered index, you'll get a clustered index seek;
otherwise it'll be a clustered index scan.

In the nested cursors version, you'll have an absolute minimum of one
data page read for each execution of the innermost cursor. That results
in a total of at least 50 page reads.
In the single curosr version, the optimizer can consider different
plans. For instance, depending on the total number of rows in the
formulas table, the total number of data pages in this table could well
be less than 50; in that case, one single table scan would be faster
than 50 consecutive clustered index seeks.
(Using the figures you gave above: 1500+ rows for the formulas table, a
row size of [checking table structure in previous post - oh, I think
that was not the actual table structure, was it? - well, let's just make
a wild guess] 200 bytes, that's a grand total of 1500 x 200 = 300,000
bytes; at 8K per data page, this all fits on less than 40 data pages.)

Have you already tried both alternatives and compared results?

[quoted text, click to view]

Are you constantly scrubbing the same half-million rows? Or are there a
half-million new and/or changed rows between two executions of the job
that does the scrubbing?

In the latter case, there's not much you can do. Scrubbing raw data
requires you to access each row; a table scan is actually the fastest
option in that case. If you have so many new or changed rows that need
to be scrubbed, than that's what it takes.

But if you're checking all existing rows because some of them might have
been changed, then there are various alternatives:

1. Put the data from the flakey front-end in a temporary holding table;
the scrub job will process this table, scrub the data and move the
cleaned data from the temp table to it's permanent location.

2. Add a trigger that scrubs the data the very moment it gets in.


Best, Hugo
--

Lawrence Garvin
12/26/2005 10:40:01 AM

[quoted text, click to view]

Sadly... yes....

[quoted text, click to view]

The table contains two types of rows. The current data, and a historical
data row. As the data is changed in a row, the original row has a bit value
set, and a new row is inserted. I'm lobbying on an architecture change that
will store the historical data in a separate table, but I've also researched
and found that the historical rows currently only comprise about 40% of the
data, so even with that accomplished, the table is still 300,000 rows, and
will continue to grow as more surveys are completed and more surveys are
added to the survey pool.

The issue is that any one of those 300,000 'live' rows could be changed at
any time, though there's a relative likelihood correlated inversely with the
age of the data.

[quoted text, click to view]

I'm looking at a possible architectural implementation that will create a
flag value correlated to the survey response that will enumerate simply
whether the answer is zero, null, or otherwise. Almost all of our table
scans on this table, and particularly the one in this proc, do a scan for
zero or null answers. With a bit field marked as zero, null, or one, based
on the answer, we can index on the bit field, and then do the select where
the bit field <> 1, which will give us all zero and null responses.

[quoted text, click to view]

In fact, because a /few/ may have been changed since the last run of this
proc -- which actually runs nightly.

[quoted text, click to view]

A very reasonable solution based on what I know about the current
implementation.

[quoted text, click to view]

Hmmmmm.... now there's an idea also.

I'll pursue both... starting with #2, and then rolling back to #1 if I can't
get my idea sold to my super.

Thanks Hugo!

[quoted text, click to view]

AddThis Social Bookmark Button