sql server full text search:
I've been learning a lot about full text searching over the last few
days while working on a large scale project (well, large for me atleast
- currently almost 4 million records with near-future expansion to
about 10 million).
I still have a few unanswered questions though. First, let me give my
environment info: Single CPU Server (1Ghz), 1 GB RAM (expanding to 2GB
tomorrow), Windows 2000 server, SQL 2000 Standard Edition. Database is
on mirrored SCSI drive, Full Text Data is on new 80GB IDE drive
(couldn't fit another SCSI). The site is a merchant affiliate, which
pulls in data from Linkshare merchants and then allows for searching
globally, by category, and by merchant.
During population (full or incremental) searching on the index with
CONTAINS queries takes forever. I would expect the performance to be
affected during population, but is there anything I can do to avoid
this headache?
Here's my current situation with this: I had about 2.2 million records
in my table yesterday, and my index finally finished populating - the
index is on 3 varchar fields, one is 255 length, one is 500 and one is
2000. My queries were coming back super fast at this point. I then
turned on "Change Tracking". Not sure if this was a "bad" time to do
this. Possibly should have been done up front, but nothing I've seen
online indicates this can't be done at any point. Thinking that the
index would then update smoothly as I imported data, I proceeded to
import another 1.5 million records. (FYI, new data imports would
optimally be nightly and size would range from 2,000 to 2,000,000
records)
It is now 18 hours later and my catalog says "incremental population in
progress" and is only at about 2.23 million records. It is moving
incredibly slow. I know that it moved faster yesterday when load was
reduced later at night, so I am hoping that it finishes before the
morning.
So basically here's my questions:
1. Will my future updates index quickly with change tracking?
2. How does change tracking work? Does it re-index ANY row called
with ANY UPDATE/INSERT/DELETE statement? My import updates all
existing records. It doesn't check for changes, just assumes it needs
the latest data and updates the record. I can adjust this if
necessary. (any suggestions would be great too)
3. Is there any way to do an alternate search, or does anyone else do
something to avoid searching the indexed data while it's being
populated so it doesn't take 3 minutes to search on my website?
4. I've heard that change tracking is better than scheduled
incremental population. Would I be better off in my scenario with the
scheduled population? I would need a timestamp column right? And
would have to populate that field for all existing rows, right?
5. Any additional insight would be great. I've gotten some great
info from this forum and others, but sometimes you just need to "hear
it" yourself to get it, ya know?
Thanks in advance,
Joe Potenza
bigjoepo@gmail.com