Groups | Blog | Home
all groups > dotnet xml > june 2004 >

dotnet xml : Bad XML formatting from XmlTextWriter


isbat1 NO[at]SPAM yahoo.com
6/14/2004 10:45:40 AM
Here is a function more or less exactly as I found it from somewhere
on the internet.

static string BeautifyXML(string sXML)
{
string result = "";

System.IO.MemoryStream ms = new System.IO.MemoryStream();
System.Xml.XmlTextWriter w = new System.Xml.XmlTextWriter(ms,
System.Text.Encoding.Unicode);
System.Xml.XmlDocument d = new System.Xml.XmlDocument();

try
{
//load the xml into the document object
d.LoadXml(sXML);

w.Formatting = System.Xml.Formatting.Indented;

//copy the xml into a formatting XmlTextWriter
d.WriteContentTo(w);
w.Flush();
ms.Flush();

//rewind the memory stream before reading from it
ms.Position = 0;

//extract the formatted text
result = new System.IO.StreamReader(ms).ReadToEnd();
}
catch (System.Xml.XmlException)
{
}
finally
{
try {ms.Close();} catch (Exception) {}
try {w.Close();} catch (Exception) {}
}

return result;
}

It works pretty well, most of the time. Given this string:

"<root><childNode1><deepestNode>I am the deepest
node!</deepestNode></childNode1><childNode2><deepestNode>I am also
quite deep!</deepestNode></childNode2></root>"

It produces this nicely formatted output:

<root>
<childNode1>
<deepestNode>I am the deepest node!</deepestNode>
</childNode1>
<childNode2>
<deepestNode>I am also quite deep!</deepestNode>
</childNode2>
</root>

But when a child node has a value and also has a child node of its
own, as childNode1 does here:

"<root><childNode1>I have a value<deepestNode>I am the deepest
node!</deepestNode></childNode1><childNode2><deepestNode>I am also
quite deep!</deepestNode></childNode2></root>"

This is the unfortunate result (deepestNode is inline with
childNode1):

<root>
<childNode1>I have a value<deepestNode>I am the deepest
node!</deepestNode></childNode1>
<childNode2>
<deepestNode>I am also quite deep!</deepestNode>
</childNode2>
</root>

I had expected this:

<root>
<childNode1>I have a value
<deepestNode>I am the deepest node!</deepestNode>
</childNode1>
<childNode2>
<deepestNode>I am also quite deep!</deepestNode>
</childNode2>
</root>

Is the unexpected inlining of childNode1's child a bug in the
Oleg Tkachenko [MVP]
6/14/2004 9:48:19 PM
[quoted text, click to view]

This is by design. In mixed context (text and elements mixed) indenting
is ignored, because that's impossible to format such data without adding
significant whitespace characters thus changing document's data.


[quoted text, click to view]

Here you have modified the first text node from "I have a value" to "I
have a value\n\t\t". It's not allowed for formatters/pretty-printers to
modify significant parts of a document. Adding whitespace between tags
is safe - that's insignificant whitespace, but adding whitespace to text
means changing that text.
--
Oleg Tkachenko [XML MVP]
AddThis Social Bookmark Button