[quoted text, click to view] "Alex Kuznetsov" <alkuzo@gmail.com> wrote in message
news:a38e35aa-80f7-4dca-b72f-8e74f24571e7@e6g2000prf.googlegroups.com...
> On Mar 5, 9:32 am, "Cowboy \(Gregory A. Beamer\)"
> <NoSpamMgbwo...@comcast.netNoSpamM> wrote:
>> I still think XML may be your best option, but preload the XML into a
>> single
>> table and then run processes on the data there. This works best if you
>> can
>> create a table to load. You may need multiple load tables if you have
>> different clients. This will still hammer the server for a bit. It should
>> not be an issue for thousands of records, however. After you fill one
>> table,
>> you need to get its primary key values into the load table, especially if
>> using row guids or IDENTITY.
>
> So far this is contrary to our benchmarks - in all our experience XML
> is inefficient in terms of network bandwidth, storage, and CPU cycles
> to parse.
When consumed as XML, I would concur. For apps with heavy load, running
directly out of the in-memory XML data, even in SQL Server, has a perf
penalty associated with it. The larger the XML set, the greater the penalty.
As efficient as Microsoft has made their XML representations, you still end
up with recursion underneath the hood. If you load it into a temp table,
however, the inefficiency is reduced as the main hit is moving the data from
XML structures to the temp table. The question then is whether or not
ripping the data from the temp table into your normalized structure creates
too heavy a hit on your database. But, this can take a toll on tempdb if you
have exteremely large data sets, so you have to determine where you can
afford to take the hit.
[quoted text, click to view] >> What about batching? It is a possibility, but realize you are just
>> flattening the performance curve. Rather than a single huge hit, you are
>> taking many smaller hits. This could well be the solution you are looking
>
> I have a different opinion here too - based on years of benchmarking I
> think there is a performance price you pay for every round trip to the
> database. Dealing with batches may be way more efficient - in some
> cases 100 times more performant. I do not see the flattening you are
> speaking about, I usually observe a dramatic reduction of overall
> execution costs.
Flattening may not have been the best word to use here, but hindsight is
always 20-20. :-)
Overall, I was speaking of prepping the data outside of the database, which
can be a huge performance boost. Just as a bit of anectdotal evidence
(anectdotal, as it is a single case, not because the evidence is
inconclusive), I helped design and build an "offline" system that worked
completely outside of the database until batch load. Working with non-XML
data, of course, but files up to 25GB in size. The prior system was load and
run sprocs, which could take up to 8 days to completely normalize in the
database. The new load system originally got the load time down to about 24
hours, or approximately 12.5% of the original time. Actually time
"hammering" the database was reduced to minutes of batch upload.
Before I left the company, we had refactored to where it was about 8 hours.
I believe they are down to about 1/3rd of that time now. Most of the
inefficiencies lay in either a) concatenation of strings (StringBuilder
helped tremendously here) and b) working with strings instead of bytes
(unavoidable in some situations, but very doable during the first churns
through the data). Further efficiency was garnered by sorting the files with
Windows ports of some UNIX sort utilities (to anyone ever going this route,
it pays to get trials and compare sort times, as sort does not necessarily
equal sort. Depending on how you use the sort, you will find that some are
more efficient than others in some respects).
--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA
*************************************************
| Think outside the box!
|
*************************************************