Groups | Blog | Home
all groups > dotnet xml > june 2004 >

dotnet xml : XmlValidatingReader DTD validation


Vlad
6/7/2004 12:04:40 PM
I have the following code:
I have a local copy of the DTD that I need to validate incoming XML
documents against.
The XML document has the <!DOCTYPE myname SYSTEM "myfile.dtd"> define.
When the following code is executed the XML gets resolved through the
XMLResolver and gets correctly validated against the locally stored DTD
file.
The problem occurs when the incoming XML contains no DOCTYPE attribute. The
resolver code never gets called and the validation does not occur at all.
What am I doing wrong here? Is that a bug and there is no way to enforce
the DTD even if the icoming XML file does not specify the DTD file?

XmlDocument doc = new XmlDocument();
XmlValidatingReader reader = new XmlValidatingReader(new
XmlTextReader(stream));

reader.XmlResolver = new MyDTDResolver();

reader.ValidationType = ValidationType.DTD;

doc.Load(reader);

The XML document's stream is contained in the stream variable

Here is MyDTDResolver declaration:

private class MyDTDResolver:XmlUrlResolver

{

public override object GetEntity(Uri absoluteUri, string role, Type
ofObjectToReturn)

{

return (returns a stream of the local copy of the DTD document that I
validate XML against)

}

}

Martin Honnen
6/7/2004 6:58:00 PM


[quoted text, click to view]

It is a known shortcoming of the whole DTD approach that parsers do look
for a DOCTYPE declaration to validate the XML against. I have not
exhaustively looked at the .NET classes to perform DTD validation but it
might well be that you need to run your incoming XML stream through a
filter mirroring all nodes but inserting the DOCTYPE declaration at the
beginning to be able to perform the validation ágainst the DTD.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Yan Leshinsky
6/8/2004 12:16:10 AM
Your document has to have DOCTYPE to perform DTD validation. XmlResolver is
being used to resolve external entities, including external DTD of course.
There is not bug here.
Yan

[quoted text, click to view]

Oleg Tkachenko [MVP]
6/8/2004 11:52:44 AM
[quoted text, click to view]

No Doctype - no validation. You can add Doctype to a document using a
variety of methods. XSLT is the simplest one - the following stylesheet
adds Doctype to a source document, while preserving the rest as is:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output doctype-system="myfile.dtd"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

With big documents though it could be quite ineffective so I'd use
custom XmlReader, which exposes synthetic Doctype node if the document
doesn't have one.

--
Oleg Tkachenko [XML MVP]
Vlad
6/11/2004 12:12:21 PM
That defeats the purpose of the ValidatingReader and makes it useless since
I have no control over the incoming files. That means that someone could
just submit an XML file without a DOCTYPE and render the whole validation
process useless.


As far as using the XSLT code to insert the DOCTYPE, I couldn't use it
because the XML files that I receive could be very large and I do not want
to regenerate the whole file just to check for its validity.
As for creating a custom XmlReader, I initially tried it but it is not that
simple since in order to fake the DOCTYPE attribute one has to first
determine that it is missing or invalid and then simulate it only in that
case. Do you have any code that could do it?

Here is what I ended up doing:
Because the ValidatingReader does not throw any exceptions if DOCTYPE is
missing I just check XmlDocument.DocumentType property after the document
has been read in. If it is null-I reject the file as if it has failed the
validation. It is unfortunate, since the file might actually be fully
compatible with the DTD except for the missing DOCTYPE attribute.
If you have a sample code for the custom XmlReader or any other workaround
that would ensure that the missing DOCTYPE does not effect the
ValidatingReader functionality I would really appreciate it.
Thank you!

[quoted text, click to view]

v-kevy NO[at]SPAM online.microsoft.com
6/15/2004 8:21:41 AM
Hi Vlad,

Based on my experience, I think you can first check the DocumentType
property of XmlDocument. If it is null, you can add a <DOCTYPE> node into
the document using XSLT and validate the XmlDocument. Although it might hit
performance when the XML file is very big, it's better than reject the file
which might be valid.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
Vlad
6/20/2004 10:11:10 PM
While it's an interesting idea to check the file twice in case the
DocumentType is null I would rather see if there is any other way to do it.
As was seen in Oleg's reply the custom validator could be one of the
solutions since the XSLT would give me a substantial performance hit for
large files (which I get a lot of). Do you have any sample custom reader
code that might create a DOCTYPE if one is not present?.

However MS documentation does not specify that the XML documents MUST have
DOCTYPE or ValidatingReader won't validate. Because of that I would like to
see MS actually provide some workaround for this issue.

Thanks a lot,
--
Vlad
P.S. Please remove "REMOVETHIS" from my email address to reply...
[quoted text, click to view]

Oleg Tkachenko [MVP]
6/21/2004 12:59:17 PM
[quoted text, click to view]

You know, actually I'm not sure if it works. The problem is in tight
coupling between XmlValidatingReader and XmlTextReader -
XmlValidatingReader requires XmlTextReader as input reader (which is a
well known bug). That's not a problem - you can extend XmlTextReader to
expose synthetic Doctype. The problem is that XmlValidatingReader asks
XmlTextReader for Doctype via *internal* property of XmlTextReader. I
don't see how it can be worked around.
Another alternative would be adding Doctype while passing XML through
XmlReader-XmlWriter pipe. This whay you can modify document in a
streaming way - without loading it into memory as a whole (XmlDocument
or XSLT approach). Something like this:

XmlReader r = new XmlTextReader("foo.xml");
XmlWriter w = new XmlTextWriter("foo2.xml", Encoding.UTF8);
bool hasDoctype = false, inProlog = true;
while (r.Read())
{
if (r.NodeType == XmlNodeType.DocumentType)
hasDoctype = true;
else if (inProlog && !hasDoctype && r.NodeType ==
XmlNodeType.Element)
{
//First element is about to be written - insert Doctype here
w.WriteDocType(r.Name, null, "foo.dtd", null);
inProlog = false;
}
w.WriteNode(r, false);
}
r.Close();
w.Close();
XmlValidatingReader vr = new XmlValidatingReader(new
XmlTextReader("foo2.xml"));
while (vr.Read());

--
Oleg Tkachenko [XML MVP]
Vlad
6/21/2004 1:40:56 PM
Thanks Oleg for your suggestion. There is still a permance hit though
because I still have read the whole source file into a new file before
reading the new file again using the validating reader.
I guess I could change your code to generate the new file in memory using a
MemoryStream that might improve the performace somewhat but the main theme
stays the same--double processing.
Still... I wish MS would list this as a bug and fix this.
Thanks!

[quoted text, click to view]

Oleg Tkachenko [MVP]
6/22/2004 1:34:33 PM
[quoted text, click to view]

Yeah, MemoryStream or file or whatever appropriate for your application,
but unfortunately I don't see how you can avoid this step.

--
Oleg Tkachenko [XML MVP]
Tom Goff
3/2/2005 12:03:53 PM
I recently wanted to accomplish the same thing (validate an XML file
using a local DTD file and not the DTD defined in the XML file) and
could not find a solid answer. I found the answer when trying to
validate an XML file using XSD, and I wanted to ignore the DOCTYPE
because it was adding an ever-so-slight overhead to retrieve the DTD.

In short, you can write a custom XmlUrlResolver and then "ignore" the
remote DTD and load a local DTD.

ex.

XmlTextReader xtr = new XmlTextReader("myfile.xml");
xtr.XmlResolver = new LocalDocumentTypeResolver("local.dtd");

XmlValidatingReader xvr = new XmlValidatingReader(xtr);
xvr.ValidationType = ValidationType.DTD;

while (xvr.Read());

....and the custom resolver...

class LocalDocumentTypeResolver : XmlUrlResolver
{
public LocalDocumentTypeResolver(String systemEntry)
{
this.systemEntry = systemEntry;
}

protected String systemEntry = "";

override public object GetEntity(Uri absoluteUri, string role, Type
ofObjectToReturn)
{
Regex re = new Regex(@"(.)*\.dtd$");
Match m = re.Match(absoluteUri.AbsolutePath);
if (true == m.Success)
{
return new FileStream(systemEntry,
FileMode.Open,
FileAccess.Read,
FileShare.Read);
}

return base.GetEntity(absoluteUri, role, ofObjectToReturn);
}
}


[quoted text, click to view]
name
3/3/2005 4:38:35 AM
Guck.

Publiziere etwas, das Sinn macht.

==============

Egal welche Huren Du wo voegelst.

Oder welchen Faustschen Pakt Du mit MS hast.

===========


Mit Deinem Post bist eine Arschlabberer.






[quoted text, click to view]
AddThis Social Bookmark Button