all groups > sql server full text search > november 2003 >
You're in the

sql server full text search

group:

PDF Search Solution?


Re: PDF Search Solution? John Kane
11/24/2003 4:39:52 PM
sql server full text search:
Charlie,
That's what I'm using and it works great on Win2K and Win2003.

John


[quoted text, click to view]

PDF Search Solution? Charlie
11/24/2003 6:01:09 PM
Greetings folks

What are people using to index and search PDF files. Is PDFFilt version 5.0
still the latest and greatest.

TIA

Charlie

Re: PDF Search Solution? Charlie
11/25/2003 9:18:34 AM
Hi John

Can you be more specific?

I am trying to use the indexing and search functionality from Windows
Sharepoint Services, but clearly WSS is using the fulltext features of SQL
2000 to get there.

I have a SQL table with a fulltext index. If I add an Office document (.doc,
..xls, etc.) I can search it successfully using sql like

Select * from Docs where CONTAINS(Content, 'teststring')
WHERE:
Docs is the table included in the fulltext index
Content is an Image column
'teststring' is a string which is in the .doc file

If I add a row to the Docs table and put a .pdf file into the Content image
field, I can search forever and never find the pdf record.

One of the registry keys that I have been focussed on is
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Search\1.0\Gathering
Manager\DLLsToRegister. The key on my server does not show the pdffilt.dll
in its list. It also happens that it is a binary key. Can you confirm the
same?

Any other hints you can offer?


[quoted text, click to view]

Re: PDF Search Solution? John Grauel
11/25/2003 11:23:21 AM

Hi Charlie,

For Sharepoint Services (the new one on W2K3) the
implementation for PDF filtering is really
straightforward. You simply download the Adobe
PDFFilt.dll and install it. Adobe packages it as and
executable that self registers the dll. Install it on the
SQL Server, restart the server and go.

I've only used it a short while (and in a test
environment) but it seems to work very well.

Re: PDF Search Solution? (qing_liu NO[at]SPAM ncsu.edu)
11/25/2003 1:48:33 PM
Hi, Charlie, I am trying to use the full-text indexing functionality from SQL server 2000.

I have a SQL table with a fulltext index. I added an MS Office Word document (.doc) I can not search it successfully (zero rows return) using sql like

Select * from TestTable where CONTAINS(ole, 'main')
WHERE:
TestTable is the table included in the fulltext index
ole is an Image column
'main' is a string which is in the .doc file

Can you tell me how you do it?

Thank you very much!

Qing

**********************************************************************
Sent via Fuzzy Software @ http://www.fuzzysoftware.com/
Re: PDF Search Solution? John Kane
11/25/2003 2:28:33 PM
Charlie,
Sure... Now that I know what you're looking for, I can explain it in more
details.
First of all, you do need to download and install Adobe's PDF IFilter and
ensure that it has installed correctly. One quick way to verify this is to
use the Indexing Service and setup a catalog to index a folder with pdf
files and use IS to search on unique keywords in the pdf files. If you get
the correct hits, then you know that the PDF IFilter has installed
correctly.

As for SQL Server 2000 and Full-Text Search of pdf files, there's a bit more
setup required. First of all, you have to store the PDF files in a column
defined with an IMAGE datatype and also have a column that defines the "file
extension" to be bound to the IMAGE column. Specifically, this "file
extension" column must use a datatype of char(3) or varchar(4) or sysname in
order for the "Microsoft Search" service to correctly recognize the file
time and launch the correct IFilter. Additionally, how you import or insert
the document (pdf file) into your SQL table is important as well. Using
TextCopy.exe or ADO Stream are both methods that work successfully. Once you
have this completed, you then should start a Full Population on your FT
enable table and then review the Application event log for "Microsoft
Search" source events for any errors and/or messages of a successful
population. Finally, you can issue your SQL FTS contains query, where your
column "content" is the IMAGE column and test your query using a unique
keyword from the PDF file:

Select * from Docs where CONTAINS(Content, 'teststring')


FYI, the Registry key, you should focus on and can edit is:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex
DLLsToRegister -- your PDFFilt.dll should be listed here.

Regards,
John



[quoted text, click to view]

Re: PDF Search Solution? Charlie
11/25/2003 7:11:36 PM
John

thanks for your helpful responses. I feel like I can make some progress
here.

It would appear that the install of the iFilter on my server has not been
successful. I cannot successfully search for a term that I know is in the
PDF files which are part of an Indexing Service catalog.

I think that this must be the issue....

Is there anything else I can do to make sure that the install is successful?

Thanks Again

Charlie

[quoted text, click to view]

AddThis Social Bookmark Button