Ok, FYI, I think we figured it out.
If your root element has attributes, e.g.:
<rootElem attrib1="xyz">
or direct sub-elements of the form:
<subElem>whatever</subElem>
then a DataTable will get created within the DataSet with your root
element as its name. The DataSetName property will remain
"NewDataSet" (or whatever you passed to the DataSet constructor).
Why? Because the DataSet class represents such attributes and sub-
elements as COLUMNs in a DataTable. Therefore the root element
must be created as a DataTable containing these columns.
However, if the root element has no attributes or such sub-elements
then there are no columns to create and therefore no need for the
DataSet class to create a DataTable for the root element. In these
cases, the DataSetName property is just set to the root element name
and any direct sub-elements are created as DataTables contained in
the DataSet.
So, e.g., if your XML looks like this:
<rootElem>
<subElem>
<value>whatever</value>
</subElem>
</rootElem>
The DataSetName will be "rootElem" and there will just be one DataTable
called "subElem" contained within the DataSet.
The fact that it works this way causes some interesting problems:
1) If the root element gets created as a DataTable, then all child
DataTables of the root element DataTable will have a new column added
called "rootElem_Id". This contains the root element DataTable row index
for the row with which the child DataTable entry is associated.
However, this column is created transparently so you don't even know it's
there. So when a new row gets added to the child table, this value has to
be initialized to 0. If you don't do this you'll get the following
exception:
Unhandled Exception: System.InvalidOperationException: Token StartElement
in state Epilog would result in an invalid XML document.
which is not very useful in telling you what the real problem is and
it thus was very hard to debug.
2) When you create an XML schema in VS.NET from an XML file whose
root element has attributes or the previously described sub elements,
the schema gets created with a schema id of "NewDataSet" and there
is a sub-element with the "correct" root element name. At least in
our case, this is not what we wanted. We wanted the schema id to be
our root element name. The only way we could do this was to change
our XML so there were no attributes on the root element and no
sub-elements of the form <subElem>whatever</subElem>
Thanks for the help.
Bill
[quoted text, click to view] "Chris Lovett" <chris@!nospam!.net> wrote in message news:<vru5dpi0l063cf@corp.supernews.com>...
> XML is hierarchical. DataSet is not. So there are some "hieristics" that
> are used to "normalize" the XML into a relational shape. One of those
> hieristics is that if an element contains children then it is mapped to a
> table, just in case those children are repeated somewhere.
>
> You can make the DataSet behave much more predictably by providing it with
> an XML Schema. To see how this is done call WriteXml with
> XmlWriteMode.WriteSchema. Then be sure to read this file back in using
> XmlReadMode.ReadSchema.
>
> You are using XmlReadMode.InferSchema which is one of those "let's help
> people get started" kinds of features. Not really something you should use
> in a "production" application.
>
>
> "Bill C." <billybobc1210@yahoo.com> wrote in message
> news:1e75a833.0311210833.14494667@posting.google.com...
> > Hi,
> >
> > I've got a simple console app that just reads an XML file into a DataSet
> > then prints out a description of each table in the DataSet, including
> > column names and row values for each column. I'm getting some strange
> > results depending the input XML file I use. I was wondering if somebody
> > could help me understand what is going on or point me to a good reference.
> >
> > The code for my program looks like this:
> >
> > using System;
> > using System.Data;
> >
> > static void Main(string[] args)
> > {
> > DataSet ds = new DataSet();
> > ds.ReadXml("test.xml");
> > PrintDataSet(ds);
> > }
> >
> > public static void PrintDataSet(DataSet ds)
> > {
> > Console.WriteLine("DataSet name: " + ds.DataSetName);
> > foreach (DataTable dt in ds.Tables)
> > {
> > int rowCount = dt.Rows.Count;
> > Console.WriteLine("\nTable: " + dt.ToString() + " (" + rowCount +
> " rows)");
> > foreach (DataColumn dc in dt.Columns)
> > {
> > Console.WriteLine("Column: " + dc.ColumnName);
> > for (int i = 0; i < rowCount; i++)
> > {
> > Console.WriteLine("Row " + i + ": " +
> dt.Rows[i][dc].ToString());
> > }
> > }
> > }
> > }
> >
> > When "test.xml" looks like this:
> >
> > <?xml version="1.0" standalone="yes"?>
> > <test>
> > <product>Product 1</product>
> > <customer>
> > <name>Bill</name>
> > <company>Bill's Co.</company>
> > </customer>
> > <customer>
> > <name>Sue</name>
> > <company>Sue's Co.</company>
> > </customer>
> > </test>
> >
> > The output looks like this:
> >
> > DataSet name: NewDataSet
> >
> > Table: test (1 rows)
> > Column: product
> > Row 0: Product 1
> > Column: test_Id
> > Row 0: 0
> >
> > Table: customer (2 rows)
> > Column: name
> > Row 0: Bill
> > Row 1: Sue
> > Column: company
> > Row 0: Bill's Co.
> > Row 1: Sue's Co.
> > Column: test_Id
> > Row 0: 0
> > Row 1: 0
> >
> > However when "test.xml" looks like this (NOTE: the only difference
> > is in the <product> element):
> >
> > <?xml version="1.0" standalone="yes"?>
> > <test>
> > <product>
> > <name>Product 1</name>
> > </product>
> > <customer>
> > <name>Bill</name>
> > <company>Bill's Co.</company>
> > </customer>
> > <customer>
> > <name>Sue</name>
> > <company>Sue's Co.</company>
> > </customer>
> > </test>
> >
> > The output looks like this:
> >
> > DataSet name: test
> >
> > Table: product (1 rows)
> > Column: name
> > Row 0: Product 1
> >
> > Table: customer (2 rows)
> > Column: name
> > Row 0: Bill
> > Row 1: Sue
> > Column: company
> > Row 0: Bill's Co.
> > Row 1: Sue's Co.
> >
> > Questions:
> >
> > 1) I think I see why the test_Id column gets created "on the fly" for
> > the "customer" table in the first XML example. A "test" table got
> > created which might have multiple rows. So each row of the "customer"
> > table has to have some way of relating back to its parent table row,
> > which is what "test_Id" contains. But why does the "test" table
> > itself need a "test_Id" column?
> >
> > 2) In the first example a "test" table got created and the name of the
> > DataSet stayed at the default "NewDataSet". In the second example there