all groups > sql server full text search > may 2004 >
You're in the

sql server full text search

group:

Customized filter for content searching


Customized filter for content searching kay
5/19/2004 2:19:51 PM
sql server full text search:
Hi !
I have some office documents which I am storing as image
type (as blobs) in a table. I have some additional header
data in the blob other than the content of the office
documents. Is there a way to integrate just the content
of teh office document with the SQL server search?
I know that one way to do it is by implementing IFilter.
Can someone explain how that will work or send me
appropriate links for that.

Another question is about the html files that have images
in it. How does that get stored in the database and yet
qualify for SQL server search? How can one store the html
file and the folder with images as blobs and yet enable
the search on the document?

Any help would be appreciated. I am using SQL server 2000
with all the service packs applied.

Re: Customized filter for content searching Hilary Cotter
5/19/2004 8:14:55 PM
SQL FTS can only index document contents, not properties. Header and
footers of word docs are indexed as part of the document body.

So if by header data you mean document summary or custom office properties,
these are not indexed by SQL FTS. If you mean the header of the footer this
will work.

Images in html files referenced by metatags, ie img src will not be indexed,
as only document properties are indexed, not the contents of meta or src
tags.

The indexing that is done of image documents is rudimentary. Some of the
custom image iFilters do expose interfaces to index these properties, but
not in SQL FTS. In other search services like Sharepoint portal server,
Indexing services, and Exchange content indexing it is possible to index
some properties and ocr'd content of tiffs. But these are normally not
indexed as attachments or embedded objects of documents, and are not indexed
ever when they are parts of html docs. Again SQL FTS does not index them as
they are, for the most part properties.

Your best approach would be to extract the textual data from these documents
and store the metadata/properties in column in the table you are FTI'ing.


[quoted text, click to view]

Re: Customized filter for content searching John Kane
5/20/2004 7:37:14 AM
Kay,
Could you provide more info in regards to your table structures, i.e.,
CREATE TABLE statements as this may be possible, if I understand your
requirement correctly. There is a way to integrate the content of the office
documents with SQL Server Full-Text Search (FTS) in SQL Server 2000.
Checkout the SQL Server 2000 Books Online (BOL) title "Filtering Supported
File Types"

As for your images (jpg files, etc.) you will need to store them separately
in a column defined with the IMAGE datatype. See the following KB articles
on importing & extracting binary files (images) into and out of SQL Server:

258038 (Q258038) HOWTO: Access and Modify SQL Server BLOB Data by Using the
ADO Stream Object
http://support.microsoft.com/?kbid=258038
309158 (Q309158) HOW TO: Read and Write BLOB Data by Using ADO.NET with C#
http://support.microsoft.com/default.aspx?scid=kb;EN-US;309158

308042 (Q308042) HOW TO: Read and Write BLOB Data by Using ADO.NET with
VB.NET
http://support.microsoft.com/default.aspx?scid=kb;EN-US;308042

326502 (Q326502) HOW TO: Read and Write BLOB Data by Using ADO.NET Through
ASP.NET
http://support.microsoft.com/?id=326502

Depending upon what you want to search on, you can implement a JPEG IFilter
or use the achnor text in the HTML file as the search string for the image.
If you have further questions, please post your table structures as well as
SQL FTS queries.

Regards,
John




[quoted text, click to view]

Re: Customized filter for content searching kay
5/20/2004 8:55:35 AM

Thanks a lot Hilary, for your prompt reply.
I could get the Office document blobs working with FTS,
that is not where i faced the problems. I have a couple
of questions regarding issues pertaining these.

1) I want to add some of my own customized data that our
programming system is using other than the office
document blob. Say, I first add my own serialized data in
the blob and then add the office document data in the
blob and upload it in the image field. If I do that, is
it possible to still be able to use the FTS on the part
of the blob which is the actual office document data? I
mean, is there some way that I could write some code that
can give the FTS only the relevant data to be used for
indexing.

2) If I have a word document which has embedded pictures
and then I save it as a filtered Html, I get some images
in a folder and the images are linked in the html file.
How can I upload this filtered html document as a blob?
Is it that the folder containing the images has to be
stored separately from teh html blob? or is it that there
is some way in which both html and the folder with the
image are added in the same blob and yet work with FTS?

I hope my questions are clear. Any help from you would be
appreciated. I always can split up the blob and store as
separate fields, but if there is a way to do them all up
as the same blob, it would be great.
Thanks a lot !

Regards,
kay

[quoted text, click to view]
Re: Customized filter for content searching kay
5/20/2004 9:08:08 AM
Thanks a lot John, for your prompt reply.
I have posted the same question content to Hilary too. I
would appreciate it if you could also send me your
thoughts and expertise on this.

I could get the Office document blobs working with FTS,
that is not where i faced the problems. The search
results are satisfactory. I have a couple of questions
regarding issues pertaining these.

1) I want to add some of my own customized data that our
programming system is using other than the office
document blob. Say, I first add my own serialized data in
the blob and then add the office document data in the
blob and upload it in the image field. If I do that, is
it possible to still be able to use the FTS on the part
of the blob which is the actual office document data? I
mean, is there some way that I could write some code that
can give the FTS only the relevant data to be used for
indexing.

2) If I have a word document which has embedded pictures
and then I save it as a filtered Html, I get some images
in a folder and the images are linked in the html file.
How can I upload this filtered html document as a blob?
Is it that the folder containing the images has to be
stored separately from teh html blob? or is it that there
is some way in which both html and the folder with the
image are added in the same blob and yet work with FTS?

I hope my questions are clear. Any help from you would be
appreciated. I always can split up the blob and store as
separate fields, but if there is a way to do them all up
as the same blob, it would be great.
Thanks a lot !

Regards,
kay


[quoted text, click to view]
AddThis Social Bookmark Button