home · blog · groups · about us · contact us
DevelopmentNow Blog
 Tuesday, January 17, 2006
 
 

One of my current projects involves creating a data warehouse from multiple datafeeds. For cost reasons we're going with MySQL Community Edition, but we're sticking with ASP.NET, C#, and Windows on the code end.

I'll admit that Windows, C#, and MySQL are an unusual combination, but I love C#, and MySQL recently came out with version 5.0, which features stored procedures, transactions, triggers, foreign keys -- lots of good stuff. How could I resist? If you've never installed MySQL before, don't worry...the installer is pretty easy to follow, and the install guide helps you through. Just make sure you also download the tools and the ODBC connector driver.

Since I'm going with an ELT approach (bulk load into an unconstrained temporary table, perform cleanup & transformations, then import into the "real" tables), I needed a way to review the import process and check the errors. A lot of the examples on the web have you doing everything from the command line, calling the MySQL command line tools from PHP, piping stuff out to text files, etc. I wanted to instead wrap the import process into a nice little C# app. So let's jump into the code!

First we need to connect to the database. I'm using the ODBC driver instead of the new ADO.NET provider, because the ODBC overhead isnt' an issue with so few queries, and the ADO.NET provider is so new that I don't quite trust it yet. :)


string conString = "DRIVER={MySQL ODBC 3.51 Driver}; SERVER=localhost; DATABASE=test; UID=theuser; PASSWORD=thepassword; OPTION=3";
OdbcConnection mysqlConnection = new OdbcConnection(conString);
mysqlConnection.Open();

Next we're going to bump the max error count to something big. The default of 64 just isn't enough when you're importing a lot of data. You want to see all the problems encountered during the import.


OdbcCommand updateErrorCount = new OdbcCommand("set max_error_count = 10000;", mysqlConnection);
updateErrorCount.ExecuteNonQuery();

Now let's import data from a flat file into a staging table using the MySQL LOAD DATA bulk import command. This staging table should have a schema that matches the flat file, but shouldn't have any constraints other than a primary key. If there are any constraints (foreign key, etc), and even one record violates them, then the whole import will roll back. And if you have 1 bad row out of 100,000, you would probably prefer to get the 99,999 good records instead of zero. If you designate a primary key, then you can write a LOAD DATA statement that will only import records whose primary key isn't already in the table. This allows you to import multiple files (or even the same file) repeatedly without worrying about duplicate keys.


string importData =
"LOAD DATA LOCAL INFILE 'F:/imports/mls_import_file.txt' "+
"IGNORE "+
"INTO TABLE mls_staging "+
"FIELDS TERMINATED BY '|' "+
"LINES STARTING BY '' "+
"TERMINATED BY '\r\n' "+
"IGNORE 1 LINES; ";

Let me quickly go over the different lines:

  • LOAD DATA LOCAL INFILE 'F:/imports/mls_import_file.txt' causes a bulk load from the F:/imports/mls_import_file.txt file. The LOCAL keyword has the client program read the file and send it to the server. If you omit the LOCAL keyword, then MySQL will try to read the file directly, which is faster, but may have permissions issues.
  • IGNORE tells MySQL to not import any records whose primary key already exists in the destination table
  • INTO TABLE mls_staging is the destination table. It needs to have the same columns as the flat file, otherwise you need to specify which columns to import.
  • FIELDS TERMINATED BY '|' says that the fields are pipe-delimited
  • LINES STARTING BY '' means the records don't start with any special character. Those are two single-quotes, btw. 
  • TERMINATED BY '\r\n' means each record is terminated by a CRLF
  • IGNORE 1 LINES; tells MySQL to skip the first line (e.g. if it contains column header information). And the semicolon indicates the end of a statement, just like in C#.

Now we're actually execute the import statement:


OdbcCommand importCommand = new OdbcCommand(importData, mysqlConnection);
int importRecords = importCommand.ExecuteNonQuery();
Debug.WriteLine(importRecords + " records imported.");

When the LOAD DATA command runs, it'll pull all those rows and load as many into the table as possible, skipping any duplicates. Since there are no foreign key constraints, the main errors we'll see are if the file can't be read or the server is having problems. However, we might see a number of warnings, e.g. missing columns, too-large field values that get truncated, etc. We want to know about all that stuff, so we're first going to get the number of warnings and errors.


OdbcCommand warningCommand = new OdbcCommand("select @@warning_count;", mysqlConnection);
int warningCount = Convert.ToInt32(warningCommand.ExecuteScalar());
OdbcCommand errorCommand = new OdbcCommand("select @@error_count;", mysqlConnection);
int errorCount = Convert.ToInt32(errorCommand.ExecuteScalar());
Debug.WriteLine(String.Format("{0} warnings, {1} errors", warningCount, errorCount));

Now we can inspect the number of warnings and errors, and if there are any, we can log them, email someone, etc. The below code just loops through the warnings and errors and displays them in the output window.


if (warningCount > 0)
{
// show any warnings
OdbcCommand warningListCommand = new OdbcCommand("show warnings;", mysqlConnection);
OdbcDataReader warningList = warningListCommand.ExecuteReader();
while (warningList.Read())
{
Debug.WriteLine(warningList.GetString(0) + " " + warningList.GetString(1) + " " + warningList.GetString(2));
}
warningList.Close();
}

if (errorCount > 0)
{
// show any errors
OdbcCommand errorListCommand = new OdbcCommand("show errors;", mysqlConnection);
OdbcDataReader errorList = errorListCommand.ExecuteReader();
while (errorList.Read())
{
Debug.WriteLine(errorList.GetString(0) + " " + errorList.GetString(1) + " " + errorList.GetString(2));
}
errorList.Close();
}

And lastly we close up shop by calling

mysqlConnection.Close();

Hopefully the above will help you out when importing data in the unholy(?) alliance of MySQL and C#. MySQL's new version bring along a ton of goodies, so don't be surprised if you see MySQL databases popping up more frequently.

 

 

January 17, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Friday, January 13, 2006
 
 

Microsoft has two VM (virtual machine) solutions: Virtual PC 2004, and Microsoft Virtual Server. Since an MSDN subscription comes with both, how do you know which one to use?

Microsoft has published a white paper describing the differences, but in a nutshell the main differences are

  • The "officially supported" operating systems are different. Client OSes are supported on Virtual PC, while server OSes are supported on Virtual Server. In theory, both products should run the same OSes (see the What Works in Virtual PC site), so it's just the "official" support that differs.
  • Virtual PC has sound, Virtual Server doesn't.
  • Virtual PC makes it easier to drag & drop files between the host computer and VMs.
  • Virtual PC has the "shared networking" mode to make it easy for your VMs to connect to the web.
  • Virtual Server has support for multiple CPUs, hyperthreading, 3.6 of RAM, and SCSI drives. It also offers better control over VM CPU & memory utiliization.
  • Virtual Server offers remote administration via WMI, MOM, RDC, the Virual Machine Remote Control (VMRC), and other server management tools.
  • Virtual Server can be controlled via a COM API.
  • Virtual Server offers more networking and multi-server options, including clustering.

So...if you need to emulate a number of servers talking to each other and let them use your beefy dual-CPU box with 4gb of ram, you might want to use Virtual Server. Otherwise you can probably use Virtual PC.

January 13, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [1]



 Monday, January 09, 2006
 
 

I recently got an MSDN Universal subscription via the Microsoft Empower for ISV program ($375 for MSDN Universal), which comes with only all Windows OSes, Visual Studio, SQL Server, Biztalk, Office, and lots of other software. One such program is Microsoft's Virtual PC 2004, which lets you create any number of virtual machines (VMs) each of which acts like its own computer.

Since I'm currently working on 3 different projects (one in .NET 1.1 & SQL Server 2000, one in .NET 2.0 & MySQL, and one in ASP.NET 2.0 & SQL Server 2005), I wanted to have separate development environments for each project to reduce the likelihood of cross-environment contamination (SQL 2000 affecting SQL 2005, .NET 2.0 messing up .NET 1.1, etc.). Not to mention that keeps it cleaner.

Virtual PC Benefits for Developers

Three big benefits for developers from Virtual PC (along with other VM solutions like VMWare and Virtual Server) are

  1. you can easily whip up a VM to try out beta software, etc without messing up your main machine
  2. you can keep different dev environments separate so you can switch between projects without having to uninstall or pollute your files
  3. you can have multiple VMs running at the same time in order to simulate an entire network of computers. For example you could have a VM emulating Windows 2003 & IIS 6, and have that VM communicate with another VM running SQL Server 2005 and Biztalk.

So for me, I'll start by creating a base VM with XP Pro SP2, Firefox, and Office. No dev tools yet. Using that XP Pro VM as a base, I'm then going to make at least three different VMs: one with .NET 1.1 & SQL 2000, one with .NET 2.0 and MySQL, and one with .NET 2.0 and SQL 2005.

When you create a VM based on an existing one, the advantage is you don't have to install anything that's already in the base VM (e.g., I won't have to install XP Pro, Firefox, or Office for my three dev environments). The decision you have to make is whether you want to just make a plain ol' copy of your base VM, or create a new "differencing disk" VM.

Copying your Base VM

Making a copy of a base VM is easy and performs well. You're literally making a copy of the VM files. The downside is it can take up a lot of disk space. To make a copy of a VM:

  • go into My Documents\My Virtual Machines (or whatever your VMs live) and create a new folder for the new VM, e.g. My Documents\My Virtual Machines\NewVMCopy.
  • find the .VHD file of the base VM, and make a copy of it into My Documents\My Virtual Machines\NewVMCopy. Rename it to something like NewVMCopy Hard Disk.VHD.
  • start up Virtual PC 2004, click
    • New->Next
    • Use default settings to create a virtual machine->Next
    • Click Browse, go into BaseVMCopy and enter a name for a new VMC file (e.g. NewVMCopy.VMC)->Next
    • Click Finish
    • The settings dialog will come up, click Hard Disk 1 in the left, check "Virtual Hard disk file:" on the right and browse to the .VHD file you put into the NewVMCopy. Bump up the Memory setting to 256MB RAM or more, and you may want to ensure the Networking option is set to a valid NIC or Shared (NAT).
    • Click OK
  • Your copy is now ready to start up, but before you do, make sure the VM you copied it from isn't running.
  • Now fire up NewVMCopy. Once you get it started, you'll want to give it a new machine name and SID so that it doesn't conflict with any other VMs on the network. To do this, download Sysinternal's NewSID and run it from the command line as "newsid /s <new name>" where "<new name>" should be a new, unique name for this VM. It will run a bit and then automatically restart your VM.

Congratulations, you've made a copy of a VM! Now you can back this VM up (by just backing up the VHD file), install beta software on it, lots of fun stuff. It's "virtually" hassle-free. ;)

Making a Differencing Disk VM

The other way to make a copy of a VM is to create a "differencing disk." The big disadvantage with copying a VM like we did above is that it can take up a lot of space. XP Pro and Office can make a 2gb .VHD file. Copy that .VHD file four more times and you've now got 10 gigs of space taken up.

A differencing disk VHD only stores what's different between itself and the base VHD. So if you create a differencing disk and only install a small program, the VHD file is going to be small, too. The downside is differencing disks can perform more slowly (although that seems to be debated), and if you change anything in a VHD you'll break any differencing disks based on it.

Rather than reinvent the wheel, I'll link to a few good articles on differencing disks. BTW, some sites refer to VMs created in Virtual PC 2004 as VPCs (Virtual PCs).

http://blogs.msdn.com/donsmith/archive/2005/06/16/429700.aspx
http://weblogs.asp.net/cumpsd/archive/2005/03/08/389738.aspx
http://andrewconnell.com/blog/articles/UseVirtualPCsDifferencingDisksToYourAdvantage.aspx

And a special bonus link: Top Ten Tips for Virtual PC 2004

And two more links to Griffin Caprio's thoughts on VMs: Why he likes VMs and Just in time development. And dammit Griffin you should turn on SEO-friendly URLs. :) 

 

January 9, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Friday, January 06, 2006
 
 

Sourcegear has an SCM solution called Vault that has a few big things going for it as a VSS alternative:

  • You can import VSS projects
  • It can work alongside VSS, so you don't have to import all your VSS projects
  • It's easy to use and familiar to VSS users
  • It integrates into the Visual Studio IDE (if you're into that)
  • You can connect to it remotely over HTTP or HTTPS
  • It's cheap (free for a single user, $300 per user after that. VSS is ~$500/user)

I'm working on a small project now where the other developer is using Vault for source control, so this gives me a good chance to check it out. I'll also probably use Subversion and VSS 2005 for other projects.

Code | Tools
January 6, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 
 

Subversion 1.3 was released a few days ago with a number of enhancements. The biggest news for me is the official support for the Windows _svn hack. In a nutshell, Subversion used to create working directories that started with a dot, but some versions of ASP.NET didn't work with those directories. The "hack" was to start those directories with an underscore instead. You can read more about the hack here. Any utilities will need to call the new Subversion API in order to work with the hack. TortoiseSVN (popular Windows explorer shell plugin for Subversion) has a release candidate that works with the new version -- I'd suggest waiting a week or two for a final version of TortoiseSVN.

What this all means is it will soon be much easier to use Subversion for all .NET development. Which is good, since everyone is trying to dump VSS. :)

* When I say "supports" I mean it officially supports the hack that lets Subversion handle ASP.NET projects and let ASP.NET keep working.

Update: TortoiseSVN 1.3.1 is now released and supports Subversion 1.3.

ASP.NET | Code | Tools
January 6, 2006    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Thursday, December 08, 2005
 
 

The released version of SQL Server 2005 is missing the Import/Export Wizard that was available during beta. Now Microsoft wants you to practice using the Business Intelligence Development Studio to create an Integration Services Project. There's even an Integration Services ETL tutorial on MSDN about it.

However, you can still get to the Import/Export (aka DTS) Wizard by typing dtswizard at the command prompt. :)

December 8, 2005    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Friday, December 02, 2005
 
 
I haven't blogged for a while because I'm in the middle of moving cross country to Portland, and my computer is packed. ;/ But I found a laptop & will write a quick blurb about some basic search engine optimization techniques and web marketing. In my case, I was asked by my dad to soup up countrysidemanor.net, the web site of Countryside Manor, a retirement home our family runs. I haven't started yet (oops) but here's the basic plan...

Make the site useful

If the site isn't useful then your other marketing efforts are wasted. Some basic tips:
  • Keep the page size small. Don't clutter it (especially the home page) with huge graphics, flash movies, etc.
  • Make it easy for people to contact you. A prominent "Contact Us" link that takes the user to a page with address, phone number, business hours, and a link to maps & directions (e.g. mapquest.com) is helpful. To get a maps & directions link, go to your favorite map site, enter in the address, copy the URL from the browser address bar, & link to that from your site in a new window.
  • Have a site menu with easy to understand links. Don't make people scroll over icons or colored boxes or otherwise have to guess how to navigate your site. Clearly labeled links and sections work best.
  • Don't leave "under construction" or "coming soon" lying around. If you don't have time to build a page, don't link to it. Also, get rid of broken links.
  • Spend some time thinking about what information you want to provide and how it should be organized. You can start with a bulleted outline (just like you might have done for high school essays) to organize your ideas. Your web site shouldn't look messy or cluttered.

Optimize for spiders

Next you'll want to ensure that spiders and search engines are able to crawl your site and understand what it's about. Some suggestions:
  • Make sure that you can reach every page on your site just by clicking links. A good test is to turn off javascript on your browser, and just use your mouse to get around your site. If you can't reach a page without filling in a form or using javascript, then spiders can't either.
  • Don't use querystrings if possible. Spiders a less likely to go to pages wuth querystring information. You can read up on mod_rewrite or "url rewriting" if you want to use querystrings but not have them appear in your URLs.
  • Make sure the HTML in your pages accurately reflects what's in the page. Specifically
    • Ensure each page has a descriptive TITLE tag
    • Add a META Keywords and META Description tag to the HEAD section of your page
    • Make proper use of header tags (H1, H2, etc.) to group the information on your page.
    • Your title, headers, and meta tags should contain action verbs, nouns, and other relevant words that people may search on.
    • The more content (aka text, not images) that's on your pages, the more likely spiders are to index it.
    • All images should have ALT tags.
  • Make sure your domain name isn't going to expire. ;) That happened with the Countryside Manor retirement home site ... a squatter grabbed the .com domain name when it expired (the expiration notices got lost in the mail during a registrar transfer snafu). Unfortunately, the squatter wanted $3000 to give it back, so we bought the .net domain name instead, and paid it up for 10 years.

Help spiders find your site

Lastly, you'll want to help spiders find your site, usually by getting other sites to link to it. Some ways to do that are
  • Submt your site to search engines directly. For example, you can submit to Yahoo, or submit your site to MSN.
  • Add your site to web directories. DMOZ (the Open Directory Project) is a large, free directory you can list your site with. Also try Yahoo Directory. Plus, do a web search on relevant sites (e.g. "retirement home directory") to get a list of other sites you can submit your web site to. In my case I found 5-10 different sites that I could get countrysidemanor.net listed on. Some of them cost money, but many were free.
  • Contact other web masters and ask them to link to you. This is called a "link exchange." Look for related web sites and contact the web master, asking if they'd be willing to link to your site and in exchange you'll update your site to link to theirs.
  • Start a blog. You can sign up on Blogger, Typepad, Wordpress, or some other blogging service, and have the blog hosted on your web site. Your blog will usually have a RSS URL, which you can then submit to numerous blog directories such as those listed by Robin Good at the RSS Top 55. If you write something interesting or useful in your blog, you might even get people to link to it for free and say something nice about you. :)
So, the above is a basic strategy you can employ when someone asks you "Hey, can you put my site on Google?" I didn't bother going into paid placement like with Google's AdWords, because you can get a lot of mileage out of good ol' SEO. Remember, there's more to SEO than keywords and META tags ... it's about getting lots links to your site from other sites, and then making your site a place worth visiting in the first place.
December 2, 2005    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [1]



 Wednesday, October 26, 2005
 
 

Yes, free. Microsoft is doing a 200-city US tour over the next two months to promote Visual Studio 2005, SQL Server 2005, and BizTalk Server 2006. Not only is it an all-day event with speakers, classes, Q&A sessions, and more, but they're giving away a free copy of VS 2005 & SQL Server 2005 to all attendees (while supplies last). Best of all, it costs nothing to register.

So check out the Visual Studio 2005 Launch Tour site and register. And get free stuff. And then start developing -- believe me, the new tools are sweet.

October 26, 2005    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]



 Wednesday, October 19, 2005
 
 

It's often handy to have your web site or application send emails. "Thanks for registering." "Thanks for your order." "Hey tech support, the web site is down."

.NET 2.0 provides a few handy classes for outbound emails. Below is some code you can use in your ASP.NET application to send an email. Notice there is some authentication code you can use if your SMTP server requires a username and password in order to send emails. If you aren't sure, try sending an email without a username or password. If you get an SmtpException with a 550 or 553 Relay denied error (SmtpException.StatusCode == SmtpStatusCode.ClientNotPermitted), then you'll need to authenticate. :)

// create the message object
System.Net.Mail.MailMessage myMessage = new System.Net.Mail.MailMessage();

myMessage.To[0] = "some.guy@hotmail.com";
myMessage.From = "some.girl@hotmail.com";
myMessage.Subject = "this is a test message";
myMessage.Body = "hello there!\ntest message!";

// create the SmtpClient object. Specify the SMTP server in the Host property
System.Net.Mail.SmtpClient mySMTPClient = new System.Net.Mail.SmtpClient();
mySMTPClient.Host = "smtpserver.emailfarm.com";

if (SMTP_SERVER_REQUIRES_AUTHENTICATION)
{
	// create the NetworkCredential object with the username & password 
	// to authenticate against the SMTP server
	System.Net.NetworkCredential myCredentials = 
		new System.Net.NetworkCredential("username", "password");
	mySMTPClient.UseDefaultCredentials = false;
	mySMTPClient.Credentials = myCredentials;
}

// send the message
try
{
	mySMTPClient.Send(myMessage);
}
catch (SmtpException e)
{
	Response.Write(e.StatusCode);
}

If you feel the need to email a file to someone, use code like the below

// Attach a file to the email message. 
// The second parameter is the content type (text, binary file, etc.)
System.Net.Mail.Attachment myAttachment = new Attachment(strFilename, MediaTypeNames.Text.Plain);
myMessage.Attachments.Add(myAttachment);

Lastly, you can send an email asynchronously via the SmtpClient.SendAsync(MailMessage, CallbackToken) method. You can attach an event handler to the SmtpClient.SendCompleted event to receive notification when your email has been sent. One big downer is you can only send one asynchronous email at a time -- if you're currently waiting for a previous SendAsync call to finish, other Send or SendAsync calls will fail. Which sortof sucks.

// Send an email without waiting for completion
mySMTPClient.SendCompleted += new SendCompletedEventHandler(MyHandler);
mySMTPClient.SendAsync(myMessage, null);

void MyHandler(System.Object sender, AsyncCompletedEventArgs e)
{
	// code goes here to say "sending complete!" or something
}

Related links:
SmtpClient: http://msdn2.microsoft.com/en-us/library/4971yhhc(en-US,VS.80).aspx
SendAsync: http://msdn2.microsoft.com/en-us/library/x5x13z6h(en-US,VS.80).aspx
Attachment: http://msdn2.microsoft.com/en-us/library/e02kz1ak(en-US,VS.80).aspx

 

October 19, 2005    Bookmark to Digg or other social bookmarking
#    Disclaimer  |  Comments [0]