all groups > dotnet xml > june 2004 >
You're in the

dotnet xml

group:

reading small XML file with HUGE DTD (MathML / entities)


reading small XML file with HUGE DTD (MathML / entities) Michel de Becdelièvre
6/28/2004 2:24:15 PM
dotnet xml:
I have some *performance* trouble reading MathML files in my application (in
ASP.Net).

- I have small MathML files (2-3k) as input
- as (almost) all MathML files these use entities. I have no way to restrict
the entities used.
- to read an XML file entities into a document, you need to use a DTD, or
you get an exception (any other way ?)
- The MathML DTD is HUGE (2400+ Entities, ~300k of files), loading it in a
document is a big CPU and file access hog, specially for a ASP.net
application. As you see the DTD is easilly a hundred times bigger than the
file to load.

I tried to pay the price only once by caching an empty XmlDocument and
reusing it as a template, but doc.Clone() is also a big CPU hog, and trying
:
doc = docIn.Implementation.CreateDocument();
XmlNode n = doc.ImportNode( docIn.DocumentType, true );
to initialise the DTD is better but still intensive.

Any ideas on a better way to handle XmlDocument with a large number of
entities ? Pointers ? Suggestions ?

Thanks in advance.




Re: reading small XML file with HUGE DTD (MathML / entities) Chris Lovett
7/2/2004 5:24:37 PM
Unfortunately, re-using "cached" DTD's is a tough problem, because the
instance document can always override any part of the DTD, including
parameter entities, by providing an internal subset, so we have not
optimized this case.

If you really need your mathml documents to be validated, you could convert
the DTD to XSD (using the Visual Studio 2005 XML editor) and then validate
using a cached XmlSchemaSet. But alas, XSD does not have entities.

So sounds like we need to work on a better solution for caching DTD
entities, character entity sets specifically. I filed this as a work item
for us to consider.


[quoted text, click to view]

Re: reading small XML file with HUGE DTD (MathML / entities) Michel de Becdelièvre
7/3/2004 10:21:50 AM

"Chris Lovett" <clovett@microsoft.com.no_spam> a écrit dans le message de
news:eAbGgQJYEHA.1764@TK2MSFTNGP10.phx.gbl...
[quoted text, click to view]

I do not *really* need to validate (I'm confident on the quality of the
MathML emitter), but I need to parse the MathML into a XmlDocument tree
(need to be able to backtrack for instance), and I have found no way (even
with Validation.None) to avoid parsing the entities.

[quoted text, click to view]

Thanks. Won't solve my problem now, but will be needed for MathML and may be
for XHTML.


Re: reading small XML file with HUGE DTD (MathML / entities) Chris Lovett
7/6/2004 5:22:20 PM
In the meantime you could subclass XmlTextReader and turn off general entity
expansion, and expand the XmlEntityReference nodes yourself based on a
hashtable when they are returned from XmlTextReader.

[quoted text, click to view]

AddThis Social Bookmark Button