Groups | Blog | Home
all groups > dotnet xml > june 2004 >

dotnet xml : End-of-line Handling in the XmlReader (.NET Framework Version 1.1)


Matthew Heironimus
6/21/2004 11:41:10 AM
According to the XML 1.0 (Third Edition) W3C Recommendation =
(http://www.w3.org/TR/2004/REC-xml-20040204/#sec-line-ends) all #xD, =
#xA, and #xD#xA character combinations should be converted to a single =
#xA character.

According to the "Reading XML with the XmlReader" section of the ".NET =
Framework Developer's Guide" on-line help, the XmlReader will not =
perform this normalization by default. You can cause the XmlReader to =
perform this normalization by setting the Normalization property to =
true. This does not appear to be the case in every situation. The test =
below was performed using the .NET Framework Version 1.1.

Sample XML File:
<?xml version=3D"1.0"?>
<test>
<input>12345</input>
<input>12
3</input>
<input>12
34</input>
<input>12&#xD;&#xA;3</input>
<input>12&#xD;&#xA;34</input>
<input>12&#xD;3</input>
<input>12&#xD;34</input>
<input>12&#xA;3</input>
<input>12&#xA;34</input>
</test>

Sample XSD Schema File:
<?xml version=3D"1.0"?>
<xsd:schema xmlns:xsd=3D"http://www.w3.org/2001/XMLSchema">
<xsd:element name=3D"test">
<xsd:complexType>
<xsd:choice minOccurs=3D"0" maxOccurs=3D"unbounded">
<xsd:element name=3D"input">
<xsd:simpleType>
<xsd:restriction base=3D"xsd:string">
<xsd:maxLength value=3D"5"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>

If the XML File above is loaded using a XmlReader and a =
XmlValidatingReader object with the XmlReader.Normalization property to =
false, the following two errors are generated:

Error 1:
The 'input' element has an invalid value according to its data type. An =
error occurred at file: Test Case.xml, (7, 5).
<input>12
34</input>
^

Error 2:
The 'input' element has an invalid value according to its data type. An =
error occurred at file: Test Case.xml, (9, 25).
<input>12&#xD;&#xA;34</input>
^

These errors are expected since the input file was not normalized and =
the <input> element can only be 5 characters long. One would assume that =
setting the XmlReader.Normalization property to true would eliminate =
these two errors, however that is not the case. The following error =
still exists even with the XmlReader.Normalization property set to true:

Error 1:
The 'input' element has an invalid value according to its data type. An =
error occurred at file: Test Case.xml, (9, 25).
<input>12&#xD;&#xA;34</input>
^

It appears as if the XmlReader does not perform normalization if the =
CR-LF appears as a &#xD;&#xA;. Am I misinterpreting the XML =
specification or is the XmlReader not handling this case properly?

-------------------------------------------------------------------------=
---=20
Excerpt from the XML 1.0 (Third Edition) W3C Recommendation =
(http://www.w3.org/TR/2004/REC-xml-20040204/#sec-line-ends):

2.11 End-of-Line Handling
XML parsed entities are often stored in computer files which, for =
editing convenience, are organized into lines. These lines are typically =
separated by some combination of the characters CARRIAGE RETURN (#xD) =
and LINE FEED (#xA).

To simplify the tasks of applications, the XML processor MUST behave as =
if it normalized all line breaks in external parsed entities (including =
the document entity) on input, before parsing, by translating both the =
two-character sequence #xD #xA and any #xD that is not followed by #xA =
modico
7/2/2004 3:35:52 PM
you are wrong.
the spec. says the translation occurs "before parsing".
so, before parsing, &#xD;&#xA; are not line-break sequence.

AddThis Social Bookmark Button