DevelopmentNow Blog
 Wednesday, August 30, 2006

I recently needed to strip out non-alphanumeric characters in SQL Server. I initially thought I might be able to use a managed stored procedure and C# regular expressions to do so, but I thought the performance would be bad (e.g. you'd have to cursor through a table, extract a field value, use RegEx on it, go to the next row, etc.). So I came up with the below function using T-SQL's quasi-regular expressions in PATINDEX:

/*******************************************************************
dbo.fnStripNonAlphaNumeric

Removes all non-alphanumeric characters (including spaces) from
@input, e.g.

select dbo.fnStripNonAlphaNumeric('Help, I "think" I''m falling!')

returns

HelpIthinkImfalling

*******************************************************************/

CREATE FUNCTION dbo.fnStripNonAlphaNumeric
(
    @input varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
    
    DECLARE @i int
    DECLARE @result varchar(500)
    SET @result = @input
    SET @i = patindex('%[^a-zA-Z0-9]%', @result)
    WHILE @i > 0
    BEGIN
        SET @result = STUFF(@result, @i, 1, '')
        SET @i = patindex('%[^a-zA-Z0-9]%', @result)
    END

    RETURN @result

END

Then in use it's something like

SELECT dbo.fnStripNonAlphaNumeric(FieldWithAlphaNumerics) as AlphaCleanValue
FROM MyTable

FWIW, to strip non-alphanumeric in C# you can use the one-liner (assuming you have a initial string called "input")

System.Text.RegularExpressions.Regex.Replace(input, @"[\s\W]*", "")

:)

 

 

 

August 30, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Monday, August 21, 2006

I went into this in my post about SQL Server remote connections, but basically, for each SQL Server instance running on your database server, you can enable remote access, control which IPs and ports SQL Server listens on, and which IPs are allows to access which port.

Use Windows Security when possible

Connecting via a Windows account is more secure than connecting with a SQL Server login and password. In the past those values were sent in plaintext (eek), although now SQL Server 2005 offers lockouts, password complexity enforcements, and password expiration for SQL Server logins, so they're not as unsecure as they used to be. Also if you're using the SQL Native Client to connect, your login packet is encrypted, so the password isn't in plaintext anymore.

If you need to use SQL Server logins, don't use the sa account -- log in with a different account with limited permissions. And use strong passwords. If it's available over the internet, you should use an SSL certificate to encrypt connections, too (see Encrypting Connections to SQL Server in Books Online).

Allowing Remote Access

SQL Server doesn't allow remote access by default. So if you want other machines to access SQL Server, open up SQL Server Surface Area Configuration (in Programs->Microsoft SQL Server 2005->Configuration Tools) and click Surface Area Configuration for Services and Connections. On the left side, navigate to the Remote Connections node under your SQL Server instance's Database Engine node, and ensure Local and Remote connections is selected with Using TCP/IP only. Then click OK.

CropperCapture[23].gif

 

Configuring IPs

You do so via opening up SQL Server Configuration Manager, opening the Network Configuration node and clicking the Protocols section. You'll see TCP/IP on the right.

CropperCapture[14].gif

 

Double click TCP/IP. You'll see a dialog like below with some settings on the Protocol tab:

CropperCapture[15].gif

By default, Listen All is set to Yes, which means SQL Server is listening on every IP the server has. If the server has a public IP, or if you don't want it listening on certain IPs, set Listen All to No. Now switch to the IP Addresses tab.

CropperCapture[17].gif

You'll see entries for all the machine's bound IPs -- in the above example 192.168.0.10 is the internal IP, 77.89.121.42 is a public IP, and 127.0.0.1 is the local loopback IP (note those aren't my real IPs, they're just for show).  Active means the IP address is a working IP network-wise -- changing the value via the dropdown doesn't do anything. If Listen All is set to No, you can set specific ports for various IPs, or tell SQL Server to not listen on them by setting Enabled to No. Notice how I've used the port 2000 and disabled access on the public IP.

FYI, Dynamic Ports is a setting where SQL Server finds an available port at runtime. I don't recommend Dynamic Ports because a) you have to run the SQL Server Browser service (a security risk) in order to inform clients which port is being used, and b) it's hard to protect SQL Server with a firewall because you never know which port is going to be used. So you're stuck opening up a bunch of ports, which is bad.

Anyhow, if Listen All was set to Yes on the Protocol tab, then you can't apply settings for individual IPs -- you instead apply settings for all IPs in the IPAll section at the bottom.

CropperCapture[18].gif

 

Configuring the Firewall

Now you can create a firewall rule allowing certain IPs access to your machine over certain ports. If you're using Windows Firewall, you can do so by first opening Windows Firewall from the Control Panel and clicking the Exceptions tab. You'll see a list of current exceptions.

CropperCapture[19].gif

Click Add Port. Give your rule a name (e.g. "SQL Server Rule") and specify the port you set in SQL Configuration Manager.

CropperCapture[20].gif

Now we want to specify who can access our server over this port. By default, everyone can (even folks over the internet), which IMO isn't super secure. So click Change Scope. You can choose My Network to allow all computers on your network to access your server, but if you want a more narrow range (good if you have a large network but only a few machine should be able to access your machine), you can enter the specific IPs and/or masks in the Custom List section.

CropperCapture[22].gif

Now click OK, and click OK again, and your new firewall rule is applied.

Finishing Up

Now restart your SQL Server service, and you're ready to connect. You can also feel better that you've protected your SQL Server instance.

 

 

August 21, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



By default, remote connections are disabled for all versions of SQL Server 2005, including SQL Server 2005 Express. If you try to connect from a different machine, you'll get an error message like this:

An error has occurred while establishing a connection to the server when connecting to SQL server 2005, this failure may be caused by the fact that under default settings SQL server does not allow remote connection.

To resolve it, you need to enable remote connections in the SQL Server Surface Area Configuration Tool (under Start->Programs->Microsoft SQL Server 2005->Configuration Tools) and then restart the SQL Server Express service. See this full walkthrough is on MSDN on how to do so.

If you then get errors like "Error Locating Service/Instance Specified" you have two options: using the SQL Server Browser, or using TCP/IP & specifying the port when you connect.

SQL Server Browser

The SQL Server Browser service tells interested parties the SQL Server instances available on the machine. It's a mild security risk (especially if you expose it over the internet), but it lets you easily connect to SQL Server instances, especially if your database server has multiple instances running. You'll need to ensure that the SQL Server Browser Service is running on the database server, and that the database server's firewall isn't blocking the SQL Server Express service or the SQL Server Browser service. See this MSDN article on how to do so.

TCP/IP Ports

This method is more secure but a little more work. You don't have to run the SQL Server Browser service, and you get to pick (and manage traffic on) the IP and port that each SQL Server instance listens on. Worth it for a public server, IMO. If you're using TCP/IP and don't want to run SQL Server Browser for security reasons, or if you have multiple SQL Server instances on your server, you'll need to check the TCP/IP ports that your SQL Server Database Engines are listening on, and ensure each running instance has its own port.

Open up SQL Server Configuration Manager (under Start->Programs->Microsoft SQL Server 2005->Configuration Tools), expand SQL Server Network Configuration, click Protocols for SQLEXPRESS, double-click TCP/IP. Note whether "Listen All" is Yes or No. Click the IP Addresses tab. If "Listen All" was Yes, then you can set the ports in the "IPAll" section below. If "Listen All" was set to No, then ensure that there's a port specified for the listed IP addresses, or disable (set Enabled to "No") any public IPs you don't want to expose SQL Server on.

Then open up Windows Firewall on the database server (or whatever your firewall program is) and allow TCP traffic in for the port you specified. Be sure to indicate who you want to access this port, too (e.g. the internet, your local network, a specific subnet, or just the local machine).

Then you should be able to connect to the server using "<ip address>,<port>" (e.g. "192.168.0.10,8000") from SQL Server Management Studio.

 

August 21, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Thursday, August 17, 2006

I wanted to share a quick & easy way to validate uploaded images in ASP.NET. One of my projects has a feature allowing users to upload a logo. But I wanted to restrict them to JPG & GIF images, and ensure that the image was within a certain height. So....asp:CustomValidator to the rescue!

Here's the code for the web page with the INPUT file tag and the CustomValidator:


<INPUT type="file" id="txtCorporateLogo" name="txtCorporateLogo" runat="server">

<asp:CustomValidator ID="valLogo" Runat=server CssClass="validator"
ErrorMessage="Logos can only be GIF or JPG images under 100 pixels high"
OnServerValidate="ValidateLogo" ClientValidationFunction="checkLogo" ControlToValidate="txtCorporateLogo"
><br>Logos can only be GIF or JPG images under 100 pixels high</asp:CustomValidator>


Here's the client-side javascript function that the CustomValidator calls to ensure that we don't post to the server if the image isn't a JPG or GIF. The function sets IsValid to true if there's no file specified, or if it ends with jpg, jpeg, or gif:


<script language="javascript">

		function checkLogo(sender, args)
		{
			var filename = document.getElementById('txtCorporateLogo').value.toLowerCase();
if (filename.length < 1) { args.IsValid = true; } else if (filename.indexOf('.jpg') == -1 && filename.indexOf('.jpeg') == -1 && filename.indexOf('.gif') == -1) { args.IsValid = false; } else { args.IsValid = true; } } </script>

And here's the server-side method that confirms the image is 100 pixels high or shorter:


        protected void ValidateLogo(object sender, ServerValidateEventArgs args)
        {
            HttpPostedFile imageFile = this.txtCorporateLogo.PostedFile;

            if (imageFile.FileName == String.Empty)
            {
                args.IsValid = true;
            }
            else if (!imageFile.FileName.ToLower().EndsWith("jpg") && !imageFile.FileName.ToLower().EndsWith("gif"))
            {
                args.IsValid = false;
            }
            else
            {
                System.Drawing.Bitmap bitmap = new Bitmap(imageFile.InputStream);

                if (bitmap.Height > 100)
                    args.IsValid = false;
                else
                    args.IsValid = true;

                bitmap.Dispose();

                imageFile.InputStream.Position = 0;    // reset the position in the stream
            }
        }



That's it. Just remember to check Page.IsValid in your submit method before doing anything with the image.

And I'm starting to get annoyed with FTB's code import. Look at that mess above! :)

August 17, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Saturday, August 12, 2006

SQL Server has a decent full text search engine (IMO), but if you have HTML data in your database, it can be tricky searching on it. For example, if users search on the word "strong" you don't want to bring back data like "<strong>this text is emphasized</strong>". Also, there were early problems with SQL 2000's word breaker, in that it didn't treat > or < as a word delimiter (this problem has since been resolved).

Since SQL 2005 is around I thought I'd throw out a few ways I've noticed to store & search on HTML data.

Just Store HTML data in a varchar or text column

First of all, you can go the simple route & store it in a varchar(max) or text column, and create a full text index on that column. 

The upsides

  • It's a simple approach
  • FTS Change Tracking will track the values for varchar columns. If you're using a text column, changes are tracked unless made via WRITETEXT and UPDATETEXT. That's not much of an issue with SQL 2005, though, since WRITETEXT and UPDATETEXT are now deprecated.
  • It's easy to update values in varchar or text columns
  • You'll get matches on all the words

The downside

  • You'll also get matches on words inside comments and HTML tags (e.g. "font", "arial", "body")

So this might be a place to start for an 80/20 HTML search engine approach, and you could maybe treat words like "font" "td" etc as noise words so they're ignored in searches. Not a perfect solution, though, especially if your users like to search on the word "title."

Store HTML data in an XML Column

Now that SQL 2005 has an "XML" data type column, you can store your HTML data in that instead and search on it.

Upsides:

  • Search results won't include tagnames, attribute names, or words within comments
  • Change tracking will track XML column changes

Downsides:

  • Could be hard to update values in the column, I don't know how easy it is to programmatically interact with XML column data types
  • Your HTML needs to be well-formed. "< font > hey there </ font>" will return an error. "<font> hey there </font>" won't.
  • Full Text Search won't match on tag and attribute names (good), but will match on attribute values (bad). For example, if your data is "<font face="Arial">hi there</font>", searching on "font" won't return a match, but searching on "Arial" will.

Here's a complete script to try out in your own database (SQL Server 2005 only). It creates a new database called "ftstest" and a table called "Test".

create database ftstest
go
use ftstest
go
sp_fulltext_database 'enable'
go
Create fulltext catalog FTSCatalog as default
go
CREATE TABLE Test (
ID int not null identity constraint PK_Test primary key,
Title varchar(1000),
Description XML)
go

insert into Test (Title, Description) values
('some stuff goes here', '<font face="Arial">test1</font>')
go
insert into Test (Title, Description) values
('some second row', '<font> test2 </font>')
go
insert into Test (Title, Description) values
('some stuff here', '<font> test3 foobar </font>')
go
insert into Test (Title, Description) values
('some stuff here', '<font>boogie</font>')
go

CREATE FULLTEXT INDEX ON Test(Title, Description) KEY INDEX PK_Test
GO

-- these queries return data
select * from Test where FREETEXT(*,'stuff')
select * from Test where FREETEXT(*,'test1')
select * from Test where FREETEXT(*,'test2')
select * from Test where FREETEXT(*,'boogie')
select * from Test where FREETEXT(*,'Arial')

-- these don't
select * from Test where FREETEXT(*,'face')
select * from Test where FREETEXT(*,'font')

Store the HTML in an Image Column

This is the old standby. SQL Server can automatically ignore HTML markup in search results if you store your HTML data in a column of the image data type. You also need a second column whose value (e.g. 'htm') indicates the type of data.

Upsides:

  • All HTML markup is ignored for searches (except for a questionable feature where if you have spaces around your tags like this "< strong >" the tagname will be included in the search results).
  • You can actually use this feature to store & perform FTS searches on other types of documents, like PPT, PDF, DOC, etc. So it's good if you're doing a document management system & need to search on not only HTML documents but other kinds, too.

Downsides:

  • You have to deal with updating Image data types, which can be a huge PITA. I really wish SQL supported this for varchar or text columns.

Here's a sample script, in this case the DescriptionContentType column contains the value 'htm', telling SQL Server FTS that the Description column contains HTML data & that the indexer should use the HTML iFilter:

CREATE TABLE Test (
ID int not null identity constraint PK_Test primary key,
Title varchar(1000),
Description image,
DescriptionContentType char(3) default 'htm'
)
go

CREATE FULLTEXT INDEX ON Test(Title, Description TYPE COLUMN DescriptionContenttype)
KEY INDEX PK_Test
go

INSERT INTO Test (Title, Description) VALUES ('hi','<strong>test</strong>')
go

SELECT * FROM Test WHERE CONTAINS(*,'hi')    -- returns results
SELECT * FROM Test WHERE CONTAINS(*,'test')    -- returns results
SELECT * FROM Test WHERE CONTAINS(*,'strong')    -- no results

Use a Separate Keywords Column

This is a complex but common approach. Basically, you store your HTML in a varchar or text column, then strip out the HML markup & store the resulting text in a separate keyword column. You then perform searches on the keyword column. 

Upsides:

  • You get to avoid working with Image columns
  • HTML markup is avoided in search results
  • Change tracking will handle the keyword column (provided it's varchar or text w/o using WRITETEXT)

Downsides

  • You need to find or write a function to strip HTML from your column (easy enough with RegEx)
  • Extra storage space is consumed since your storing a lot of the data twice
  • You have to maintain two columns for HTML data: the HTML column, and the keyword column. Thus more work, more risk of bugs, & possibly more confusion. Plus all code that interacts with that table may need to be aware of & correctly use the columns correctly.

Here's a SQL blurb illustrating the concept.

CREATE TABLE Test (
ID int not null identity constraint PK_Test primary key,
Title varchar(1000),
Description varchar(max),
DescriptionKeywords varchar(max)
)
go

CREATE FULLTEXT INDEX ON Test(Title, DescriptionKeywords)
KEY INDEX PK_Test
go

INSERT INTO Test (Title, Description, DescriptionKeywords)
VALUES ('hi','<strong>test</strong>', fnStripHtmlFromText('<strong>test</strong>'))
go

SELECT * FROM Test WHERE CONTAINS(*,'hi')    -- returns results
SELECT * FROM Test WHERE CONTAINS(*,'test')    -- returns results
SELECT * FROM Test WHERE CONTAINS(*,'strong')    -- no results

Notice the fnStripHtmlFromText function -- that's the function you'd need to write to strip HTML from incoming data. For better protection, you could restrict access to the table to store procedures only, and only expose the Description column, like this:

CREATE PROCEDURE spInsertTest (
    @Title varchar(1000),
    @Description varchar(max)
)
AS
BEGIN
    INSERT INTO Test (Title, Description, DescriptionKeywords)
    VALUES (@Title,@Description, fnStripHtmlFromText(@Description))

    RETURN @@IDENTITY
END

Alternately, if you needed to use raw SQL instead of stored procedures, you could use INSERT and UPDATE triggers to maintain the DescriptionKeywords column, and your SQL could just interact with the Description column. Sorta like this:

CREATE TRIGGER dbo.tuTest
ON dbo.Test
AFTER UPDATE
AS
BEGIN
    SET NOCOUNT ON;

-- update keyword column with keywords from html column
    UPDATE Test SET
        DescriptionKeywords = fnStripHtmlFromText(i.Description)
        FROM Test t INNER JOIN inserted i ON t.ID = i.ID
END

Use a Separate Image Column

Like the separate keyword column solution, above, except that instead of parsing out the keywords, you just store a second copy of your HTML data in an Image column along with a content type column. The upside is you don't need to write an HTML keyword parser, but the downside is your keywords are in an image column (which may be a non issue since you shouldn't interact with it directly). Here's a sample script

CREATE TABLE Test (
ID int not null identity constraint PK_Test primary key,
Title varchar(1000),
Description varchar(max),
FTSDescription image,
FTSDescriptionContenttype char(3) default 'htm'
)
go

CREATE FULLTEXT INDEX ON Test(Title, FTSDescription TYPE COLUMN FTSDescriptionContenttype)
KEY INDEX PK_Test
go

Conclusion

Well that was a long post. The solution depends on what your goals are, but I'd recommend architecting your application in such a way that if you start out with the first, simplest solution, you can enhance your system later to a more sophisticated implementation without breaking everything. That means you should ideally be interacting with your system either via store procedures or objects. Then if you need to change the underlying database schema in order to handle a better search feature, you can do it in your DAL and/or procedures.

Since I'm doing this for an existing project (building a light CMS), I'm personally leaning towards a separate keyword or separate image column approach. I don't want to directly interact with Image columns at all if I can help it. :)

August 12, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Friday, August 11, 2006

If this were an ordinary post I'd show you a bunch of code illustrating how to send multipart MIME emails using .NET. But yesterday I ran across DotNetOpenMail, an open-source mail component for .NET. And I don't believe in reinventing the wheel too much.

As a reminder, multipart MIME emails allow you to embed multiple content with different MIME types (e.g. HTML and TEXT) into a single email. That way, recipients with HTML-capable email clients will see the HTML version of your email, while older email programs will display the text version.

In .NET 1.1 (which is what I was developing in yesterday), multipart MIME emails aren't really supported, although if System.Net.Mail uses CDO.Message behind the scenes, you'll automatically get a multipart MIME email generated.

So anyhow, I happily found this open-source component & it appears to work fine for my purposes. And so I thought I'd pass along the tip.

August 11, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Wednesday, August 09, 2006

Had a few issues running a 1.1 site on Windows 2003. Things I did to resolve the issues:

  • Made sure v1.1 was selected in the ASP.NET tab in IIS Manager for that site. That fixed the issue with ASP.NET not sending the aspnet_client files to the browser.
  • Made sure the \aspnet_client\system_web\1_1_4322 files were in the wwwroot directory for that site. Also copied the latest versions of the js files from C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\ASP.NETClientFiles into the \aspnet_client\system_web\1_1_4322 wwwroot folder. That resolved the issue where no postbacks were occurring due to an old bug w/ client side validation, discussed on Thomas Freudenberg's blog.
  • Was getting a weird error "CS0016: Could not write to output file 'c:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\Temporary ASP.NET Files\xxxxx'. The directory name is invalid." Turns out the TEMP & TMP environment values were set to a user-specific account. KB825791 gives the fix .. basically changing the environment values and ensuring that the ASPNET and NETWORK SERVICE accounts have full rights to the temp directory.

Now it works. :)

August 9, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Monday, August 07, 2006

So I was having a little trouble getting full text search to work with the GUI in SQL Server Express with Advanced Services (formerly SQL Server 2005 Express SP1), so I had to do things manually. It was probably a permissions or setup issue with SQL Server Expres or the tools. In addition to setting up FTS, I wanted a search query to weight columns differently in the search rankings -- something that SQL Server FTS doesn't really support.

Setting up Full Text Search

First I had to download and install SQL Server Express with Advanced Services. It's big, but comes with the goodies I wanted.

Then I connected to my SQL Server Express database using SQL Server Management Studio so I could type in some queries. If your SQL Server Express database is in your Visual Studio Project's App_Data folder, you may be out of luck -- I wasn't able to get full text search to work on those, although maybe adjusting permissions would do it.

Once connected to the database, I created a full text catalog

CREATE FULLTEXT CATALOG MyFTCatalog

Next I needed to get the name of a unique index for my table. You can only create full-text indexes on tables with a single-key unique index (e.g. an autonumber primary key index). Remember that your unique index doesn't have to be on the columns that you want to perform full text searches on.

I had a table called Listing a primary key of IdListing and three varchar fields I wanted to search on: Address, Realtor, and Notes. My table already had a unique index called PK_Listing_IdListing, so it was time to create a full-text index on the three columns I wanted to be able to search on:

CREATE FULLTEXT INDEX ON Listing (Address, Realtor, Notes)
KEY INDEX PK_Listing_IdListing
ON MyFTCatalog
WITH CHANGE_TRACKING AUTO

What the above query did is create a full-text index on those three Listing table columns and store it in the full-text catalog named MyFTCatalog. I indicated PK_Listing_IdListing as the index to help uniquely identify rows on the Listing table, and I told the Full Text Search engine to automatically update the full-text catalog if values in the table change.

Lastly I did a quick check to confirm the catalog existed and wasn't still building

SELECT FULLTEXTCATALOGPROPERTY('MyFTCatalog', 'Populatestatus')

And we're set up. Now it was time to query. And man is it hot in here. I guess overclocking your PC makes for a sweaty summer. Anyhow...moving on.

Performing Weighted Queries

There are plenty of pages about performing full-text queries in SQL Server. Here's a place to start.

So my first query looked like this

SELECT IdListing, Address, Realtor, Notes
FROM Listing
WHERE FREETEXT(*,'some keywords')

The * tells FTS to perform the search on all columns in the full-text index. But the query wasn't going to work for me, since it doesn't give more weight to one column over the other. Plus, in order to sort results by ranking, I needed to use the *TABLE full-text queries. I'm partial to FREETEXTTABLE because it already does all the stemming/etc for me.

Then I did a UNION query like this

SELECT TOP 100 Rank, Address, Realtor, Notes
FROM
(
    SELECT f.Rank, l.Address, l.Realtor, l.Notes
    FROM listing l INNER JOIN
    FREETEXTTABLE(listing, Address, 'some keywords') as f
    ON l.idListing = f.[KEY]
    UNION
    SELECT f.Rank, l.Address, l.Realtor, l.Notes
    FROM listing l INNER JOIN
    FREETEXTTABLE(listing, Realtor, 'some keywords') as f
    ON l.idListing = f.[KEY]
    UNION
    SELECT f.Rank, l.Address, l.Realtor, l.Notes
    FROM listing l INNER JOIN
    FREETEXTTABLE(listing, Notes, 'some keywords') as f
    ON l.idListing = f.[KEY]
) as myTable
ORDER BY Rank DESC

which I quickly rewrote to

SELECT TOP 100 f.Rank, l.Address, l.Realtor, l.Notes
FROM Listing l INNER JOIN
(
SELECT Rank, [KEY] from FREETEXTTABLE(listing, Address, 'some keywords')
UNION
select Rank, [KEY] from FREETEXTTABLE(listing, Realtor, 'some keywords')
UNION
select Rank, [KEY] from FREETEXTTABLE(listing, Notes, 'some keywords')
) as f
ON l.IdListing = f.[KEY]
ORDER BY f.Rank DESC

and then added some weights to the rankings, like so.

SELECT TOP 100 f.WeightedRank, l.Address, l.Realtor, l.Notes
FROM listing l INNER JOIN
(
        SELECT Rank * 5.0 as WeightedRank, [KEY] from FREETEXTTABLE(listing, Address, 'some keywords')
        UNION
        select Rank * 3.0 as WeightedRank, [KEY] from FREETEXTTABLE(listing, Realtor, 'some keywords')
        UNION
        select Rank * 1.0 as WeightedRank, [KEY] from FREETEXTTABLE(listing, Notes, 'some keywords')
) as f
ON l.idListing = f.[KEY]
ORDER BY f.WeightedRank DESC

Pretty good. You have the column weighting, and you could wrap it up in a nice little stored procedure and be good to go.

However, there was one last thing I needed. I really wanted a query that would to combine column rankings, so that if there were hits in multiple columns, the rank would be higher than a hit in a single column. So this is what I came up with.

SELECT TOP 100 f.WeightedRank, l.Address, l.Realtor, l.Notes
FROM listing l INNER JOIN
(
    SELECT [KEY], SUM(Rank) AS WeightedRank
    FROM
    (
        SELECT Rank * 5.0 as Rank, [KEY] from FREETEXTTABLE(listing, Address, 'some keywords')
        UNION
        select Rank * 3.0 as Rank, [KEY] from FREETEXTTABLE(listing, Realtor, 'some keywords')
        UNION
        select Rank * 1.0 as Rank, [KEY] from FREETEXTTABLE(listing, Notes, 'some keywords')
    ) as x
    GROUP BY [KEY]
) as f
ON l.idListing = f.[KEY]
ORDER BY f.WeightedRank DESC

Notice how I'm grouping the inner UNION query by [KEY] (in this case, Listing.IdListing) and SUMming the weighted ranks. That allows us to push results with hits in multiple columns higher up in the search rankings. Obviously it's not going to perform as well as a simpler query, but the ranking was important for this project.

Conclusion

So, there ya go. Installing SQL Server Express isn't too bad, although it's a big download. Setting up Full Text Search seemed to work best for me from the command line. And, now you have a way to rank matches with different columns having different weights.

Update: An Alternate Approach

Hilary Cotter (SQL MVP & FTS guru) provided an alternate query. I did a few tests & both seemed comparable in performance, although I didn't test using very large data sets. I made a slight change to his query and added a WHERE clause so that only matches are returned.

select TOP 100
    idListing, Address, Realtor, Notes,
    RankTotal=isnull(RankAddress,0)+isnull(RankRealtor,0)+isnull(RankNotes,0)
from listing
left join (SELECT Rank * 5.0 as RankAddress, [KEY] from
    FREETEXTTABLE(listing, Address, 'Street')) as k
    on k.[key]=Listing.idListing
left join (select Rank * 3.0 as RankRealtor, [KEY] from
    FREETEXTTABLE(listing, Realtor, 'Street')) as l
    on l.[key]=Listing.idListing
left join (select Rank * 1.0 as RankNotes, [KEY] from
    FREETEXTTABLE(listing, Notes, 'Street')) as m
    on m.[key]=Listing.idListing
WHERE RankAddress IS NOT NULL OR RankRealtor IS NOT NULL OR RankNotes IS NOT NULL
ORDER BY RankTotal DESC

Hilary also provided a script (run it in Query Analyzer or in a Query Tab in SQL Mgmt Studio) to set up a test database so you can try the query out yourself. I modified it to seed the test database with a bunch of records (since with only a few records, even LIKE is faster that FTS):

create database realtor
go
use realtor
GO
sp_fulltext_database 'enable'
GO
Create fulltext catalog realtor as default
GO
create table Listing(
    idListing int not null identity constraint ListingPK primary key,
    Address varchar(200), Realtor varchar(200), Notes varchar(200))
GO
-- add initial seed records
insert into Listing(Address, Realtor, Notes)
values('123 Any Street','John Street','the word on the street is good')
insert into Listing(Address, Realtor, Notes)
values('123 Any Road','John Street','the word of mouth is good')
insert into Listing(Address, Realtor, Notes)
values('123 Any Road','John Smith','the word on the street is good')
insert into Listing(Address, Realtor, Notes)
values('123 Any Street','John Smith','the word of mouth is good')
GO
-- multiply seed records, get up over 1M rows
-- might take a while
PRINT 'Please wait a few minutes while the database is seeded'
DECLARE @i int
SET @i = 0
WHILE (@i < 18)
BEGIN
    insert into Listing(Address, Realtor, Notes)
    select TOP 10 Address, Realtor, Notes from Listing

    SET @i = @i + 1
    PRINT convert(varchar,@i)
END
PRINT 'Database has been seeded'
GO
PRINT 'Please wait a few minutes while the fulltext index is built'
GO
create fulltext index on listing(Address, Realtor, Notes)
key index ListingPK
GO
-- check the below query. When it returns zero, the FT index is done building.
SELECT FULLTEXTCATALOGPROPERTY('realtor', 'Populatestatus')
GO


 

August 7, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Wednesday, August 02, 2006

AKA "working virtual, or virtually working?"

My friend Griffin Caprio blogged about the virtues of being a virtual worker and finding wifi hotspots. I thought I'd chime in with a few tips of my own.

Insure your stuff

Make sure your computer equipment is covered. Many homeowner policies DON'T cover computers at all, or not if they're used for business. You may want to get a small umbrella business insurance policy to cover your equipment at home & on the road (think dropped laptop at the airport). Ask around for referrals, or pick a few insurance agents out of the phone book.

Host a Web Server

If you have a static IP address from your ISP, then you can configure DNS to point to a web server on your network, and host away. If you have a dynamic IP, however, then you need to use dynamic DNS to ensure that when your IP address changes, your DNS entry (www.yourcooldomain.com) points to the right IP. There are several providers. I've used DNSExit for years and it works well, but you can also check out No-IP, TZO, or DynDNS. Or others. Some routers come with built-in support for certain dynamic DNS providers, meaning a simple config change in your router is all that's needed to keep your DNS up to date.

Back up your stuff

What would you do if your computer crashed or your hard drive blew out? Would you lose any work? How long would it take you to recover? Backups are important for any IT professional, and I'd suggest an automated approach. You can go with a service like Mozy that runs on your PC and backs stuff up in the background. Or, if you have a place you can FTP files to (e.g. your ISP or an inexpensive host like e-rice or dreamhost) you can pick up a copy of WinZip 10 Pro which can regularly zip up & upload files via FTP. Remember to not only back up documents, but emails, code, and database dumps. Having an organized directory structure where your important files are makes it easier. Then, if disaster strikes, you'll be in a better position to recover. And the silver lining is maybe you'll now have a reason to get a shiny new PC.

August 2, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]