all groups > dotnet xml > january 2007 >
You're in the

dotnet xml

group:

Can XmlDocument.Load() method handle unicode characters?



Can XmlDocument.Load() method handle unicode characters? lamxing NO[at]SPAM gmail.com
1/30/2007 12:51:35 PM
dotnet xml: Dear all,

I've spent a long time to try to get the xmldocument.load method=20
to handle UTF-8 characters, but no luck. Every time it loads a=20
document contains european characters (such as the one below, output=20
from google map API), it always said invalid character at position=20
229, which I believe is the "=DF" character.

Can anyone point me to the right direction of how to load such=20
documents using the xmldocument.load() method, or some other better=20
ways to do this?

Thanks!

---------------sample XML file------------------
<?xml version=3D"1.0" encoding=3D"UTF-8" ?>
- <kml xmlns=3D"http://earth.google.com/kml/2.0">
- <Response>
<name>germaniastr 134, berlin berlin</name>
- <Status>
<code>200</code>
<request>geocode</request>
</Status>
- <Placemark>
<address>Germaniastra=DFe 134, 12099 Tempelhof, Berlin, Germany</
address>
- <AddressDetails Accuracy=3D"8"=20
xmlns=3D"urn:oasis:names:tc:ciq:xsdschema:xAL:2.0">
- <Country>
<CountryNameCode>DE</CountryNameCode>
- <AdministrativeArea>
<AdministrativeAreaName>Berlin</AdministrativeAreaName>
- <SubAdministrativeArea>
<SubAdministrativeAreaName>Berlin</SubAdministrativeAreaName>
- <Locality>
<LocalityName>Berlin</LocalityName>
- <DependentLocality>
<DependentLocalityName>Tempelhof</DependentLocalityName>
- <Thoroughfare>
<ThoroughfareName>Germaniastra=DFe 134</ThoroughfareName>
</Thoroughfare>
- <PostalCode>
<PostalCodeNumber>12099</PostalCodeNumber>
</PostalCode>
</DependentLocality>
</Locality>
</SubAdministrativeArea>
</AdministrativeArea>
</Country>
</AddressDetails>
- <Point>
<coordinates>13.399486,52.464476,0</coordinates>
</Point>
</Placemark>
</Response>
</kml>
Re: Can XmlDocument.Load() method handle unicode characters? lamxing NO[at]SPAM gmail.com
1/30/2007 11:47:33 PM
Thanks for your reply, Bj=F6rn. Since this file is coming from a
dynamic URL online, I just used the XmlDocument.Load(URL) method to
load the xml file. In this case, how do I tell the XML processor what
encoding the file would be before I load the document? I've saved the
sample XML file (dynamicaly generated from google map) from IE's File-
[quoted text, click to view]
geo.xml . It seems to open fine in the browser, does that means
anything?



[quoted text, click to view]

Re: Can XmlDocument.Load() method handle unicode characters? Bjoern Hoehrmann
1/31/2007 7:07:07 AM
* lamxing@gmail.com wrote in microsoft.public.dotnet.xml:
[quoted text, click to view]

Then it is most likely that your document is not UTF-8 encoded. You will
have to check which bytes are actually at that position, e.g. using a
hex editor (e.g., use File.OpenFile ... /e:Binary in Visual Studio). If
the ß is encoded as two bytes C3 9F then that's either not the offending
character, or you have other encoding problems (for example, you might
have told the XML processor the document is US-ASCII encoded).

Note that loading XML documents in Internet Explorer and copying and
pasting the results does not help in any way to debug this kind of
problem, compressing the document and loading it up to some web server
is a more sensible approach.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
Re: Can XmlDocument.Load() method handle unicode characters? lamxing NO[at]SPAM gmail.com
1/31/2007 8:46:23 AM
Hi Martin,

Thanks for the test result. It seems that if I load the file I
saved earlier using XmlDocument.Load(), it worked fine. But when I
tried to load the dynamic generated file directly from google map's
server, it will cause that "invalid character in the given encoding,
line 1, position 228" error. Does that mean google map uses the wrong
encoding for that XML file? I don't think I can post the complete
google map link here as the URL contains the google map API key. But
the URL goes something like this:
http://maps.google.com/maps/geo?q=germaniastr%20134,%20berlin%20berlin&output=xml&key=GOOGLEKEY

Any thoughts?


Chris

[quoted text, click to view]

Re: Can XmlDocument.Load() method handle unicode characters? Martin Honnen
1/31/2007 2:09:53 PM
[quoted text, click to view]

You don't have to tell the encoding, pass in the URL to the Load method
and the XML parser will check the XML declaration for the declared
encoding or will check for byte order mark and will then based on that
information decode the bytes served to characters. If that is not
possible you get an error.

[quoted text, click to view]

It also loads fine with .NET and the Load method of
System.Xml.XmlDocument so that file is properly encoded. And .NET parses
it just fine (tested with .NET 1.x and 2.0).



--

Martin Honnen --- MVP XML
Re: Can XmlDocument.Load() method handle unicode characters? lamxing NO[at]SPAM gmail.com
1/31/2007 2:16:56 PM
[quoted text, click to view]

Martin,

Do you have any suggestion on how can I load this dynamic file, or how
to make the xml document properly encoded?

Thanks!
Re: Can XmlDocument.Load() method handle unicode characters? Martin Honnen
1/31/2007 5:58:31 PM
[quoted text, click to view]

It means that the XML is not properly encoded.


--

Martin Honnen --- MVP XML
Re: Can XmlDocument.Load() method handle unicode characters? Bjoern Hoehrmann
2/1/2007 1:15:44 AM
* lamxing@gmail.com wrote in microsoft.public.dotnet.xml:
[quoted text, click to view]

If the XML document is really not properly encoded, you should contact
Google to have their service fixed. Until then all you can do is try to
fix the XML document before parsing. For example, you could remove all
non-ASCII octets or you could transcode the document from Windows-1252
to UTF-8 using System.Text.Encoding.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
Re: Can XmlDocument.Load() method handle unicode characters? lamxing NO[at]SPAM gmail.com
2/1/2007 10:25:12 AM
[quoted text, click to view]


Hi Bj=F6rn, Can you provide an example of how to save an online xml
document and transcode it to UTF-8 with System.Text.Encoding? Thanks!
Re: Can XmlDocument.Load() method handle unicode characters? Helena Kotas [MSFT]
2/5/2007 6:10:02 PM
First you have to find out which encoding does the dynamic document use.
XmlDocument/XmlTextReader by default uses UTF-8 unless there is a BOM mark or
encoding attribute in the XML declaration that says something else. Once you
find out the encoding, create a StreamReader over the input stream and
specify the document's encoding in its constructor. Then create an XmlReader
over this StreamReader and use XmlDocument.Load to load the document.

If you are sure that the document's encoding is indeed UTF-8 and there is an
invalid character in it, you can create an instance of UTF8Encoding that will
ignore invalid characters (see the UTF8Encoding constuctor).

-Helena


[quoted text, click to view]
Re: Can XmlDocument.Load() method handle unicode characters? Tim Heap
3/22/2007 5:33:58 AM
Help !
I have the same problem and need to remove funny characters from my
source xml file. Please can someone supply an example..

Tim Heap
Software & Database Manager
POSTAR Ltd
www.postar.co.uk
tim@postar.co.uk

AddThis Social Bookmark Button