Groups | Blog | Home
all groups > dotnet xml > august 2003 >

dotnet xml : Writing to file mangles special characters


rbowley NO[at]SPAM lycos-europe.com
8/21/2003 9:58:38 AM
A guy called Yuri brought this up a while ago but no one got back to
him and now I have the same problem.

I have an xml file which contains special characters such as é
etc. The DTD declares these enitities and assigns them the correct
unicodes (such as é)

Now, when I've loaded the xml and converted it using the DOM and then
Console.Writelined() it, it all looks good.

Then I write it to a file thus:

FileInfo myFile = new FileInfo(filePath);
StreamWriter sw = myFile.CreateText();
sw.Write(html);
sw.Close();

When I the open the file it has replaced the characters with rubbish
such as "É".

The xml doc is utf-8 encoded (changing encoding does not appear to
make a difference)

Bjoern Hoehrmann
8/21/2003 11:10:42 PM
* nextman wrote in microsoft.public.dotnet.xml:
[quoted text, click to view]

That's probably the UTF-8 representation of the character but you
interprete the file as if it were in a different encoding, ISO-8859-1
or Windows-1252 for example. If you want to use a different encoding,
the .NET Framework provides means for transcoding, take a look at the
System.Text.Encoding namespace. If you are fine with UTF-8 you need
to tell your file viewer that the document is UTF-8 encoded (and ensure
rbowley NO[at]SPAM lycos-europe.com
8/22/2003 1:48:16 AM
Bjoern is correct :)

Rather than doing this:

FileInfo myFile = new FileInfo(filePath);
StreamWriter sw = myFile.CreateText();
sw.Write(html);
sw.Close();

I did this:

Encoding eAnsi = System.Text.Encoding.GetEncoding(1252);
StreamWriter sw = new StreamWriter(filePath, true, eAnsi);
sw.Write( html );
sw.Close();

AddThis Social Bookmark Button