Groups | Blog | Home
all groups > dotnet xml > february 2006 >

dotnet xml : Implement sorting in an XML file?


Cerebrus99
2/6/2006 12:01:16 AM
Hi all,

I am confused about how to sort an XML file. I mean how to *actually*
sort the data in the physical file, not how to display sorted data. I
am using a large XML file as a back-end database, and am making many
inserts and updates using the XmlDocument class. But I need to make the
XML file human readable too, and want to physically sort the data in
the file, every time an insert is made. At present I'm having to use a
tool like Stylus Studio to manually sort the data. Is there a way to do
it programmatically ?

My XML file is something like :

<BOOKDATA>
<BOOK>
<NAME>Book 1</NAME>
<AUTHOR>Tom</AUTHOR>
<PRICE>20.00</PRICE>
</BOOK>
<BOOK>
<NAME>Book 2</NAME>
<AUTHOR>Fred</AUTHOR>
<PRICE>30.00</PRICE>
</BOOK>
</BOOKDATA>

Thanks in advance,
Regards,
--------------------------------
From: Cerebrus99
Martin Honnen
2/6/2006 1:29:07 PM


[quoted text, click to view]


[quoted text, click to view]

But there are data bases which give you a lot of power like indexing
columns.


[quoted text, click to view]

XSLT is a programming language suitable to transform XML to another XML
structure where sorting is one possible restructuring directly supported
with the xsl:sort element.
So you could write an XSLT stylesheet to sort your XML as needed and
then save the transformation result back.
..NET 1.x has XSLT support with XslTransform, .NET 2.0 with
XslCompiledTransform.


--

Martin Honnen --- MVP XML
Cerebrus99
2/6/2006 5:23:38 PM
Hi Peter,

Thanks for that awesome explanation of the intricacies of XSLT. The good
news is that I managed to write an XSL file to sort my data completely as I
wanted. Thanks for your pointers ! I was able to use the XslTransform class
to transform my XML file into another sorted version.

However, I came up against another couple of problems :
1. My new "sorted.xml" file had it's XML declaration missing !

The first 2 lines in my Original XML doc were :
<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="HtmlView.xslt" ?>

In sorted.xml, the first line now is :
<?xml-stylesheet type="text/xsl" href="HtmlView.xslt" ?>

My XSLT file has the following first 3 lines :
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="no"/>

However, this happens only when I use my XmlDocument object as a parameter
for the XslTransform.Transform() method :
-> xslt.Transform(MyXmlDoc, Nothing, MyXmlTextWriter)

If I simply use the file names as :
-> xslt.Transform("Main.xml", "sorted.xml"), then I don't get this problem.

Upon googling for this problem, I found this :
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/htm
l/cpconinputsoutputstoxsltransform.asp

It does mention a discouraging statement : "The <xsl:output> statement is
ignored when the output of the XslTransform.Transform method is an XmlReader
or XmlWriter." !!!
But it offers no solution or workaround in this regard. Any ideas on this ?

Thanks again,
Warm regards,
Cerebrus.

----------------------------------------------------------



Cerebrus99
2/6/2006 5:40:30 PM
Oops ! Forgot to answer your questions :

[quoted text, click to view]

I'm only implementing the sorting feature in my application as a Database
maintenance mechanism, that would probably be run rarely anyway. What other
more efficient ways would you suggest to implement sorting ?

[quoted text, click to view]
so that the data appears sorted when the next access comes up, that could be
handled in many other different ways: it doesn't have to be done by
physically re-sorting the XML on disk afresh each time.

Again, what other ways would you suggest ?

BTW, the MSDN link in my prev. post got broken up into 2 lines. So direct
clicking won't help. :-(

Thanks and Regards,

Cerebrus.

Peter Flynn
2/6/2006 10:26:04 PM
[quoted text, click to view]

You can't sort it in-place. You have to write some code which sorts it
to another file and then copy it back (or the equivalent in whatever
environment you are using it).

[quoted text, click to view]

This sounds as if it may have serious efficiency implications if the
file really is "large". (How large is large for you? Some people would
consider a 500Gb file small. Others think 32kb is big.)

[quoted text, click to view]

As Martin said, there are databases which will offer you ordered access.

[quoted text, click to view]

The following XSLT code will sort that file on AUTHOR.

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="BOOKDATA">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:for-each select="BOOK">
<xsl:sort select="AUTHOR"/>
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>

But you really, *really* don't want to be running this after every
update: the overhead would be horrendous. Overnight, perhaps.

There are other, faster, non-XML ways to sort files, but they rely
on the file having a specific physical layout, and because they are
non-XML, if you break the format they expect, your data is trash.

///Peter
--
Cerebrus99
2/7/2006 12:05:37 AM
Hi Martin and Peter,

Firstly, thanks for your prompt and helpful replies.

Firstly my reason for using XML here, instead of a database like SQL
Server, was more for a learning experiment, than performance
considerations.

I get your point. I need to use XSLT to sort the XML and then copy the
result back. I cannot sort it in-place. That really clears up my
doubts. Actually I wasn't aware that I could use XSLT to transform XML
into another XML structure. I will however have to study deeper as to
how to implement that. I am using .NET Framework 1.1. Since my XSLT
isn't very strong, could you suggest any links that give a step by step
guide on how to do this.(Copy an XML file to another). I'm still trying
to understand the sample code you've attached, Peter. :-(

As to the size of the XML file, I presume that it will grow to a
maximum of 1-2 MB. I figure that would not impose serious performance
implications to discourage the use of XML. But thanks Peter, for
reminding me that "large file" is a relative term, and I should have
been more precise.

P.S.: Peter, I found the "WTF" section on your site, very interesting !
;-)

Thanks a ton,
Regards,
Cerebrus99
Peter Flynn
2/9/2006 2:06:51 AM
[quoted text, click to view]

That's XSLT's primary purpose...T for Transformations.

[quoted text, click to view]

I can't help with .NET, I'm afraid, but as for the code:

[quoted text, click to view]

This is an "identity transform" template: it matches all element nodes
which are not matched by any other template (the * is the wildcard),
and copies them exactly as they stand into the destination tree. In
each element thus processed, it then loops through any attributes and
copies them to the destination tree as well. You aren't using any
attributes in your example, but just in case. Finally it applies any
matching template to any child element nodes.

[quoted text, click to view]

This template matches a BOOKDATA element node. It again copies itself
to the destination tree, plus its attributes, but then it goes through
each BOOK element within it, sorts them by AUTHOR (you could specify a
different element type name if needed), and then performs an
"apply-templates" on itself ("." refers to the current context, in this
case each BOOK element node as it is handled), which makes the processor
search for a matching template...which in all cases will be matched by
the * template above.

The effect is to output everything as-is except that the BOOK elements
will be serialized (written out) in the sorted order.

It's probably a lousy way to sort the data in an XML document, but it
works.

[quoted text, click to view]

That's pretty small, but it will take a measurable number of seconds
to process if there's the overhead of running XSLT afresh each time.

[quoted text, click to view]

Not to discourage XML, but perhaps to discourage using XSLT to sort a
file repetitively in real time at the kind of rate you seem to be
indicating.

The real question is, why do you want to sort the file each time? If
it's so that the data appears sorted when the next access comes up,
that could be handled in many other different ways: it doesn't have
to be done by physically re-sorting the XML on disk afresh each time.

///Peter
--
Peter Flynn
2/9/2006 10:05:06 PM
[quoted text, click to view]

Ah, OK. I had assumed from your original post that this was something
that would be happening every few seconds.

[quoted text, click to view]

For a simple structure, the fastest is a non-XML sorter, but it means
the physical layout of the file becomes important, which goes against
the philosophy of XML, where some white-space can be irrelevant.

AddThis Social Bookmark Button