[quoted text, click to view] "Mark" <mmodrall@nospam.nospam> wrote in message news:A3B4FAD3-6EF0-4815-AB6C-DF0EA56DDBA5@microsoft.com...
> As David noted, the reader encoding appears to be defaulting to unicode, so
> that seems to be the problem.
When the encoding of the XMLDecl and the encoding of the content
presented to the reader are different, then you will have problems.
[quoted text, click to view] > If you use File.OpenText() to get a StreamReader and then construct
> XmlTextReader with a StreamReader, it appears to lock the encoding
> in place and XmlTextReader will not respect the processing directive.
The documentation for File.OpenText( ) is clear about interpreting
the file as UTF-8,
http://msdn.microsoft.com/library/en-us/cpref/html/frlrfsystemiofileclassopentexttopic.asp Even though the file MAY be encoded as iso-8859-1, doing this will
"present" the file's contents as UTF-8.
The encoding of the I/O StreamReader is paramount because remember,
XmlTextReader depends upon the StreamReader's Read( ) method(s).
The StreamReader is responsible for decoding from whatever bytes
are in the file to characters using it's encoding (it knows nothing about
XMLDecl).
[quoted text, click to view] > If you use File.OpenRead() to get a simple FileStream and use *that*
> to construct XmlTextReader, the XmlTextReader is more responsive
> to what's in the stream it's reading.
FileStreams can be binary, therefore choosing a FileStream gives the
XmlTextReader the option to read *bytes* instead of characters. It
then has something to say about what encoding it uses to perform this
translation.
[quoted text, click to view] > It is a step you can do manually (check to see if the first node
> you have is a processing directive and grab the encoding yourself)
That's the XmlDeclaration's Encoding property. It won't appear as
an XmlProcessingInstruction. You could use this code to inject an
XMLDecl if one isn't already present,
if ( xml.FirstChild.NodeType != XmlNodeType.XmlDeclaration )
{
XmlDeclaration decl = xml.CreateXmlDeclaration( "1.0", "iso-8859-1", null);
xml.InsertBefore( decl, xml.FirstChild);
}
to set the XML Declaration if one doesn't exist. To read the
encoding off of an XmlDocument?'s XMLDecl,
string encodingStr = null;
if ( xml.FirstChild.NodeType == XmlNodeType.XmlDeclaration )
encodingStr = (XmlDeclaration)( xml.FirstChild).Encoding;
encodingStr = ( encodingStr == null ) ? "UTF-8" : encodingStr;
If the XmlDocument's FirstChild isn't of XmlNodeType.XmlDeclaration
then it doesn't have an XMLDecl. If there is no XMLDecl, or there is
one without an Encoding, then the encoding is UTF-8 by default.
In my experience, when the encoding on the XMLDecl matches the
encoding of the content, there are no problems.
I've tried producing a file to match your example like this,
- - - WriteOut.cs
using System;
using System.IO;
using System.Text;
using System.Xml;
public class WriteOutIso8859_1
{
public static void Main( )
{
FileStream fs = new FileStream( "iso8859_1.xml", FileMode.CreateNew);
StreamWriter writer = new StreamWriter( fs, Encoding.GetEncoding( "iso-8859-1"));
writer.WriteLine( "<?xml version='1.0' encoding='iso-8859-1'?>");
writer.WriteLine( "<root>");
writer.WriteLine( "\t<first>Hello World</first>");
writer.Write( "\t<second><![CDATA[");
writer.Write( new char[] { (char)0xED, (char)0xB3, (char)0xA8} );
writer.WriteLine( "]]></second>");
writer.WriteLine( "</root>");
writer.Flush( );
writer.Close( );
}
}
- - -
When I read this file in with the following code I have no problems.
FileStream fs = new FileStream( "iso8859_1.xml", FileMode.Open);
StreamReader sw = new StreamReader( fs, Encoding.GetEncoding( "iso-8859-1"));
XmlTextReader reader = new XmlTextReader( sw);
reader.MoveToContent( );
XmlNode node = xmlDoc.ReadNode( reader);
Derek Harmon