all groups > sql server full text search > june 2004 >
You're in the

sql server full text search

group:

Word Doc Indexing Problem


Word Doc Indexing Problem M>ike
6/24/2004 10:03:49 AM
sql server full text search: Hello,

At last I've found this group as I've been struggling with this by myself
for some time now. I'm fairly new to this so I'm hopeing that this should be
an easy thing for you guys to help with, so here goes...

I have a SQL 2000 Server running the FTS on a Windows 2000 Server. I have
created a Database of name Documents and a table within it named
tbl_Documents which contains six fields. These fields are and ID (int and
key), Document (image), DocType (char), FileName (char), LoadDate (datetime)
and Spare (char). I have created a Full-Text Index for this table indexing
the Document and Spare columns and using the Englidh (United Kingdom) word
breaker with the DocType field as the Document Type column. At the moment
there are no schedules set up to repopulate it. I then start a full
population to, well populate it!

The table has four records (it's still only under test!) with four separate
documents each containing only one line of text. The Spare column contains
the words Test, Testing, <Null> and Tested. The DocType fields all contain
the string 'doc'.

Looking at the properties for the catalog that has been created I have an
Item Count of 5 and a Unique key count of 12. This implies to me that it has
populated the catalog correctly as there are probably only about 12
non-noise words in there. Although if I remove the Spare column from the
index and rebuild it I get a Unique key caont of 8 (why not 9 as one of the
spare fiels is <Null>?).

So far all well and good. I also have an ASP.NET form hosted on a separate
IIS5 box that hopefully will permit the user to enter a search word into a
field and return the results in a datagrid which is set to create columns
automatically at runtime. The query used to get the data is:

SELECT * FROM tbl_Documents WHERE CONTAINS (Document,'" & tbSearch.Text &
"')

If I run this query with the word 'Test' as tbSearch I would expect to get
two records returned as two documents contain the word test. However I get
only the column headings and no data. If I modify the query to search the
Spare column rather than the Document column (whilst the Spare column is
included in the index) I do get the three variants of the word Test as
expected.

From what I have explained above (although quite long winded), is there
anything obvious that i'm doing wrong? If you think you can help and need
more info then please let me know.

Thanks,

<M>ike

Re: Word Doc Indexing Problem M>ike
6/24/2004 10:28:00 AM
I also note that in the Application Event viewer I get a SQLFTHNDLR
information message stating: One or more documents stored in image columns
with extension 'doc' did not get full-text indexed because loading the
filter failed with error '0x1'.

If it failed then why do I get my expected 8 Unique key count?

<M>ike

Re: Word Doc Indexing Problem M>ike
6/24/2004 10:42:54 AM
There you go, you start looking and (hopefully) the answer finds you.

It looks like its a memory allocation problem according to Microsoft KBID
308771, but why would it still show 8 Unique key count?

<M>ike

Re: Word Doc Indexing Problem M>ike
6/24/2004 11:34:18 AM
OK,

I have resolved my problem!! It seems that the DocType field used to
determine which filter to use needs to have the extension of the document
type an nothing else in it. As I was experimenting with variants of 'doc' i
attempted to put '.doc' in and got a sql error saying that the entered text
was too long for the field. This hinted to me that the spaces after the text
'doc' were also being included. So I set the length of the char DocType
field to 3 and all the field entries to 'doc', repopulated and it worked.

It seems that memoray allocation was a bit of a red herring.

Moral of this story, only use the minimum length for fields that you can get
away with.

Interestingly it now shows 17 as the Unique key count. Any idea on how it
calculates this figure?

Thanks for your interest,

<M>ike

Re: Word Doc Indexing Problem John Kane
6/24/2004 8:43:11 PM
Mike,
Yep, this is somewhat document in SQL Server 2000 BOL title "Filtering
Supported File Types" - "...the file extension must be stored in a separate
column on the table. This type column can be of any character-based data
type and contains the document file extension, such as .doc for a Microsoft
Word document. If the type column is NULL, the document is assumed to be a
text file (.txt)."

Although, in this newsgroup, it's been often posted that the file extension
column must be defined as char(3), char(3) or sysname, however, recent
testing with a SQL Server 2000 with SP3 applied provides hints that an
un-documented fix was done for this as I was able to get this to work with a
varchar(100) column datatype.

Regards,
John



[quoted text, click to view]

Re: Word Doc Indexing Problem M>ike
6/25/2004 9:02:51 AM
I think my problem was that I previously had a char(16) as the field
datatype which included all those extra spaces!


[quoted text, click to view]

AddThis Social Bookmark Button