Groups | Blog | Home
all groups > dotnet xml > february 2007 >

dotnet xml : illegal character in xml file



Andy Fish
2/6/2007 4:52:03 PM
Hi,

I have an XML file that was created as a DOM tree in .Net 1.1 and serialized
to disk. If I try to put character code 1 inside one of the attributes
(don't ask why), it seems to serialize perfectly ok and I get a file that
looks like this:

<element attribute="&#1;" />

which looks perfectly valid but won't open up with an XML viewer because it
says it is an illegal character reference.

what am I missing here? surely it's legal to put any character reference in
an XML file as long as it's correctly encoded? and if it's not, how come the
framework serialized it for me without complaining?

TIA

Andy

Bjoern Hoehrmann
2/6/2007 6:00:02 PM
* Andy Fish wrote in microsoft.public.dotnet.xml:
[quoted text, click to view]

No, that's not legal, see http://www.w3.org/TR/xml for which characters
are allowed. In XML 1.0, U+0001 is not one of them. I think the API docs
warn you that some of the serialization functions do not check for well-
formedness.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
Martin Honnen
2/6/2007 6:09:26 PM
[quoted text, click to view]

With XML 1.0 &#1; is not well-formed. The XML parser and serializer in
..NET 1.x allows it nevertheless but that is a known flaw. With .NET 2.0
(still supporting XML 1.0) the parsing and serialization is more strict
by default although there are settings you can choose not to check
character references.
See
<http://msdn2.microsoft.com/en-us/library/system.xml.xmlreadersettings.checkcharacters.aspx>

I don't think there is anything you can do with .NET 1.x to have the
parser or serializer throw an error on e.g. &#1;.


--

Martin Honnen --- MVP XML
AddThis Social Bookmark Button