all groups > sql server full text search > august 2006 >
You're in the

sql server full text search

group:

Blobs and SQL Server


Blobs and SQL Server bormesh NO[at]SPAM gmail.com
8/21/2006 2:15:12 PM
sql server full text search: Hey Hilary,

Mike from microsoft.public.sqlserver.programming told me to talk to
you about my question. Here's my original post:

I'm looking for some information about storing BLOBs in SQL. A bit of
background: my company is developing a document management system.
Previously, we had stored the files on the file structure and used a
pointer in the database. We're not switching because the next iteration
of our system is going to allow for full text searches, and this
becomes a simple task when the entire document is stored in the
database.

My question is how performance will be affected when I want to import
large documents/files into the database? Will I see a huge slow down if
I import a 500 MB file? I've researched this a bit, and I don't want to
use a bit array to import the document into the database. I'd rather
just import it (but will memory be affected horribly?)

If anyone could help me out a bit, or just point me in the right
direction, that would be great.



Thanks for your help,

Rob Heinen
Re: Blobs and SQL Server Hilary Cotter
8/21/2006 9:18:23 PM
Yes, there is a performance hit while importing blobs, mainly disk and to a
lesser degree cpu. 500 Mgs is a large chunk of data, there is no easy way to
push it in. Any way you could extract the textual content and store it in
the db?

In general you store the blob in the database for manageability reasons. For
performance reasons you store the path in the db and keep the blob in the
file system. Research has indicated that you should store the blobs in a
directory structure maintained by a hash so that you have a large number of
subdirectories. retrieving a blob this way offers best seek performance in
NTFS. Also web servers are optimized for sending files, databases aren't.
You can also send the file to the client asynchronously using an iframe, if
you are retrieving it from the db it would be done serially.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Re: Blobs and SQL Server bormesh NO[at]SPAM gmail.com
8/22/2006 1:19:03 PM
Thanks for the information Hilary,

When I'm inserting the blob into the database, I was under the
impression that it had to be split up into a series of smaller files.
If a large file is inserted, then it will require many small chunks. As
we insert, I assume we're going to be using RAM to store the chunks and
then we'll hit back and forth between the file system and the database,
all the while using up our memory. Am I correct in this assumption? I
guess I'm just wondering at the feasiblity of actually inserting a
larger blob (maybe not as huge as 500 mb) on a slower computer
(assuming the computers that we're developing for have an average of
512MB of RAM). Or does the amount of RAM on the client system not
matter, and only the specs of our Database machine matter?

I hope this makes sense... I can give you more information if you need
it.

Thanks,
Rob Heinen


[quoted text, click to view]
Re: Blobs and SQL Server Hilary Cotter
8/22/2006 7:23:56 PM
Older versions of SQL Server required the blob be loaded in chunks, older
versions of ADO required you use the getchunk method, but newer versions
allow you use to stream object to push it in. The client will consume
virtual ram on the host it is running on, not necessarily the SQL Server
box.

There is a cpu and disk IO hit while inserting the blob. I have not
observered a RAM hit.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Re: Blobs and SQL Server Kevin.w.Berridge NO[at]SPAM gmail.com
8/24/2006 10:24:14 AM
Your first suggestion was: "Any way you could extract the textual
content and store it in
the db?"

We could do this, but we thought it might be better to store the binary
in the database and not have a copy of the binary on the file system
with a copy of the full text in the database and a pointer in the
database to the binary file as well as a mechanism to extract the full
text from the file. What's your take on this?

Also, even inserting the text instead of the binary, if the file is
sufficiently large I can imagine a situation in which inserting the
blob the standard way could cause some very un-needed hard drive
thrashing. (The standard way being reading all the text into a
variable and then supplying that variable as a parameter to an ad hoc
SQL query or Stored Procedure.) In ADODB there was the Stream object,
but there doesn't seem to be an equivalent in ADO.NET. Do you have a
suggestion for a good way to go about this?

Finally, we also wanted to return a sample of the text that matched the
search with our results. I know that this is not possible in SQL
Server, so I'm wondering there might be an entirely different approach
for both storing and searching these documents that we should consider.

Thanks for all your help and feedback!
Kevin
Re: Blobs and SQL Server Kevin.w.Berridge NO[at]SPAM gmail.com
8/29/2006 11:31:27 AM
[quoted text, click to view]
"Research has indicated that you should store the blobs in a
directory structure maintained by a hash so that you have a large
number of
subdirectories."

Could you expound on this a little more? We have decided to continue
storing our documents in the file system. I've been looking for some
sources detailing the kind of directory structure you mention but so
far no luck.

Thanks,
Kevin
Re: Blobs and SQL Server Hilary Cotter
8/29/2006 4:50:50 PM
Have a look at
http://groups.google.com/group/microsoft.public.win2000.file_system/browse_frm/thread/e7b4d86095272427/257d6a0a740f9d6c?lnk=st&q=%22maximum+number+of+files%22+%22in+a+folder%22+windows&rnum=1&hl=en#257d6a0a740f9d6c

and

http://www.mcse.ms/archive42-2004-12-1284650.html

Basically on previous projects I have worked on we got optimal performance
by having 100 subdirectories off the root, each with 100 subdirectories in
it, each with 100 files. Also you get better performance with file name like
00000filename.jpg than filename00000.jpg.


--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

AddThis Social Bookmark Button