One of my current projects involves creating a data warehouse from multiple datafeeds. For cost reasons we're going with MySQL Community Edition, but we're sticking with ASP.NET, C#, and Windows on the code end.
I'll admit that Windows, C#, and MySQL are an unusual combination, but I love C#, and MySQL recently came out with version 5.0, which features stored procedures, transactions, triggers, foreign keys -- lots of good stuff. How could I resist? If you've never installed MySQL before, don't worry...the installer is pretty easy to follow, and the install guide helps you through. Just make sure you also download the tools and the ODBC connector driver.
Since I'm going with an ELT approach (bulk load into an unconstrained temporary table, perform cleanup & transformations, then import into the "real" tables), I needed a way to review the import process and check the errors. A lot of the examples on the web have you doing everything from the command line, calling the MySQL command line tools from PHP, piping stuff out to text files, etc. I wanted to instead wrap the import process into a nice little C# app. So let's jump into the code!
First we need to connect to the database. I'm using the ODBC driver instead of the new ADO.NET provider, because the ODBC overhead isnt' an issue with so few queries, and the ADO.NET provider is so new that I don't quite trust it yet. :)
string conString = "DRIVER={MySQL ODBC 3.51 Driver}; SERVER=localhost; DATABASE=test; UID=theuser; PASSWORD=thepassword; OPTION=3";OdbcConnection mysqlConnection = new OdbcConnection(conString);mysqlConnection.Open();
Next we're going to bump the max error count to something big. The default of 64 just isn't enough when you're importing a lot of data. You want to see all the problems encountered during the import.
OdbcCommand updateErrorCount = new OdbcCommand("set max_error_count = 10000;", mysqlConnection);updateErrorCount.ExecuteNonQuery();
Now let's import data from a flat file into a staging table using the MySQL LOAD DATA bulk import command. This staging table should have a schema that matches the flat file, but shouldn't have any constraints other than a primary key. If there are any constraints (foreign key, etc), and even one record violates them, then the whole import will roll back. And if you have 1 bad row out of 100,000, you would probably prefer to get the 99,999 good records instead of zero. If you designate a primary key, then you can write a LOAD DATA statement that will only import records whose primary key isn't already in the table. This allows you to import multiple files (or even the same file) repeatedly without worrying about duplicate keys.
string importData = "LOAD DATA LOCAL INFILE 'F:/imports/mls_import_file.txt' "+"IGNORE "+"INTO TABLE mls_staging "+"FIELDS TERMINATED BY '|' "+"LINES STARTING BY '' "+"TERMINATED BY '\r\n' "+"IGNORE 1 LINES; ";
Let me quickly go over the different lines:
Now we're actually execute the import statement:
OdbcCommand importCommand = new OdbcCommand(importData, mysqlConnection);int importRecords = importCommand.ExecuteNonQuery();Debug.WriteLine(importRecords + " records imported.");
When the LOAD DATA command runs, it'll pull all those rows and load as many into the table as possible, skipping any duplicates. Since there are no foreign key constraints, the main errors we'll see are if the file can't be read or the server is having problems. However, we might see a number of warnings, e.g. missing columns, too-large field values that get truncated, etc. We want to know about all that stuff, so we're first going to get the number of warnings and errors.
OdbcCommand warningCommand = new OdbcCommand("select @@warning_count;", mysqlConnection);int warningCount = Convert.ToInt32(warningCommand.ExecuteScalar());OdbcCommand errorCommand = new OdbcCommand("select @@error_count;", mysqlConnection);int errorCount = Convert.ToInt32(errorCommand.ExecuteScalar());Debug.WriteLine(String.Format("{0} warnings, {1} errors", warningCount, errorCount));
Now we can inspect the number of warnings and errors, and if there are any, we can log them, email someone, etc. The below code just loops through the warnings and errors and displays them in the output window.
if (warningCount > 0){// show any warningsOdbcCommand warningListCommand = new OdbcCommand("show warnings;", mysqlConnection);OdbcDataReader warningList = warningListCommand.ExecuteReader();while (warningList.Read()){Debug.WriteLine(warningList.GetString(0) + " " + warningList.GetString(1) + " " + warningList.GetString(2));}warningList.Close();}if (errorCount > 0){// show any errorsOdbcCommand errorListCommand = new OdbcCommand("show errors;", mysqlConnection);OdbcDataReader errorList = errorListCommand.ExecuteReader();while (errorList.Read()){Debug.WriteLine(errorList.GetString(0) + " " + errorList.GetString(1) + " " + errorList.GetString(2));}errorList.Close();}
And lastly we close up shop by calling
mysqlConnection.Close();
Hopefully the above will help you out when importing data in the unholy(?) alliance of MySQL and C#. MySQL's new version bring along a ton of goodies, so don't be surprised if you see MySQL databases popping up more frequently.
Powered by: newtelligence dasBlog 2.0.7226.0
© Copyright 2010, Ben Strackany
E-mail