[quoted text, click to view] Tommy wrote:
> The problem is how to achieve the transformation as below:
>
> The source xml contains tons of repeating structure like below, each item
> node contains a person element and a insurance element that correlate to the
> Person element with the person id.
> <Item>
> <Person id=â€p123†name=â€someone1â€>
> <Insurance ref=â€p123†detail=â€blabla1â€>
> </item>
> <Item>
> <Person id=â€p123†name=â€someone1â€>
> <Insurance ref=â€p456†detail=â€blabla2â€>
> </item>
> <Item>
> <Person id=â€p456†name=â€someone1â€>
> <Insurance ref=â€p123†detail=â€blabla3â€>
> </item>
This isn't XML. It might be SGML. If you want to process it as XML, the
closing > of the Person and Insurance elements must be preceded by a /;
the typographic curly quotes must be replaced by regular " chars;
the end-tags for the Item elements must be </Item> (not lowercase i);
and there must be an outermost enclosing element.
[quoted text, click to view] > The goal is to regroup to a structure of 1(Person) to many(Insurance), like
> below
> <Item>
> <Person id=â€p123†name=â€someone1â€>
> <Insurance ref=â€p123†detail=â€blabla1â€>
> <Insurance ref=â€p123†detail=â€blabla3â€>
> </Item>
> My initial idea was to load the source into memory and dissect into
> Hashtables so that I could easily regroup. However, since the source file is
> really big (approximate 50M each with 70000 repeating items), obviously my
> way of doing it is too memory consuming. I am frustrated, after a whole day
> sitting quietly and cannot figure out a better way, I would really appreciate
> any help.
If you really wanted to do it in XSLT, you could write:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
xmlns:xsl="
http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>
<xsl:key name="ins" match="Insurance" use="@ref"/>
<xsl:template match="Person">
<xsl:if test="not(preceding::Person/@id=current()/@id)">
<Item>
<Person id="{@id}" name="{@name}"/>
<xsl:apply-templates mode="include" select="key('ins',@id)"/>
</Item>
</xsl:if>
</xsl:template>
<xsl:template match="Insurance" mode="include">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="Insurance"/>
</xsl:stylesheet>
But for a file that size the processing time would be rather long, and
as you point out, it would need lots of memory. Far better to extract
it all to CSV with a very simple linear XSLT routine and load it into a
database (or use a database XML-import system), and do it in {insert
language of choice here}.
///Peter
--