all groups > sql server full text search > august 2005 >
You're in the

sql server full text search

group:

incremental population vs. change tracking


incremental population vs. change tracking vler
8/12/2005 7:37:19 AM
sql server full text search:
We are developing a Content Management System using both SQL Server
2000 and file system as repository.

The files (content) are stored on disk and addition properties are
stored in an SQL Server database.

We'd like to provide Search functionality on both: the content of the
files and its properties.

To do so, we use built-in full-text functionality of the SQL Server, as
well as its ability to query Indexing Service catalogs (using
OPENQUERY).

In our scale out scenario, the files are located on a dedicated server,
together with the indexing service and its catalog.

- Does anybody have experience on performance penalty of using
OPENQUERY to a remote machine?

The other (more important) consideration is the built-in full-text
functionality (MSSearch.exe). This process takes a lot of CPU time even
when the incremental population is invoked. Various newsgroup topics
address this issue, but still, I have a few more questions:

- Can the population of the SQL Server catalog be delegated to a
separate machine?
- What is the advantage of "start_change_tracking" option over
"start_incremental"? (We do have a timestamp field per table) Does it
mean that "start_incremental" does not use some sort of change
tracking?
- Why do I experience heavy and slow incremental population if I only
add a single row of data with only a few words to index?

Thanks,

Eric
Re: incremental population vs. change tracking Hilary Cotter
8/12/2005 7:41:34 PM
Answers inline.

--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
[quoted text, click to view]

You could hit around a 3 orders of magnitude performance degradation using a
linked server to a remote indexing services catalog as opposed to storing
your data in your database and using SQL FTS. Your results may vary.
[quoted text, click to view]

Yes, you can build the catalogs on a remote server and copy them to a local
machine. The problem is that you can't get real time indexing and you have
to take your catalogs offline while you restore.

You could have a seperate build server and direct your queries from your
client there. I think you will find that using change tracking with update
index in background offers the best performance.
[quoted text, click to view]

Incremental means that every row is extracted and only changed/new rows are
reindexed. Start change tracking kicks off a full population if none has
been done, or an incremental population if a full population has been done
and you have the time stamp column - which you do. After the incremental or
full population is completed only changes to columns you are full text
indexing will be extracted and indexed. You get much better performance and
near real time indexing.
[quoted text, click to view]

Re: incremental population vs. change tracking vler
8/15/2005 3:35:01 AM
Thank you Hilary for your quick reaction.

I'm a regular reader of this newsgroup and I frankly expected no less
from you...

I do have some comments about querying IS catalogs.

[quoted text, click to view]

As I understand, you anticipate that my performance can suffer because
I don't save content of the files as image fields in the database?
That's good to know and can be understood. But, my dilemma was between
the 2 queries below:

SELECT *
FROM OPENQUERY(rISService,
'SELECT Path, Rank FROM SCOPE() WHERE CONTAINS(Contents, ''smth'')'
)

and

SELECT *
FROM OPENQUERY(rISService,
'SELECT Path, Rank FROM remoteMachine.catX..SCOPE() WHERE
CONTAINS(Contents, ''smth'')'
)

As you see, in the later query I instruct my SQL Server to query local
Indexing Service with the "remote" scope; the local IS recognizes
"remoteMachine.catX..SCOPE()" and distribute the query to
remoteMachine.

Since IS re-indexes the file system on idle, and my SQL Server's
machine is hardly ever "idle enough", I wanted to move the files and
the indexing of the files to a dedicated machine.

Ignoring the data transfer thru the LAN, I did not expect the
performance hit is significant. Or my scenario is not really complete?

[quoted text, click to view]

Thank you for clarifying this to me; I expected too much from the word
"incremental". "start_change_tracking" with update index in background
should indeed give me the desired performance.

Another trade-off is "one catalog" vs. "multiple catalogs", but this is
another subject and maybe another thread in this news group...

Thanks again.

Eric
AddThis Social Bookmark Button