Groups | Blog | Home
all groups > dotnet xml > march 2005 >

dotnet xml : Split large XML-file into smaller files


Niklas E
3/3/2005 9:24:19 PM
I need to split a xml-file into smaller files because the receiving system
cannot handle the large file. Any ideas on how to split it into smaller
files? I don't want to split it into too many files as well, but perhaps
1000 nodes under the root node per file or similar.

For example split this into smaller files with 1000 <Customer> per file.

<AllCustomers>
<Customer>
....
</Customer>
<Customer>
....
</Customer>
<Customer>
....
</Customer>
<Customer>
....
</Customer>
<Customer>
....
</Customer>
<AllCustomers>

Best regards
Niklas Engfelt

BrianProgrammer
3/4/2005 1:27:49 PM
If you load it as a xml document, clone the file. trim and truncate the
nodes as nessesary. You can even remove whole branches of data at once.

It is fast and simple.

If this is a standaalone data file, for a one shot use, notepad. :P

I can send code if needed.
Nick Malik [Microsoft]
3/4/2005 5:36:50 PM
If you load a very very very large file as an XML doc, you are likely to run
out of memory.

Just stream it in, creating a new document every 1000 records or so.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
[quoted text, click to view]

Nick Malik [Microsoft]
3/5/2005 10:11:32 AM
if your xml is validated by a schema, then you are probably not using the
same tag in other places. Most xml parsers throw errors on schemas that use
the same tag at different levels of the heirarchy.

So, it isn't really a problem.

I had a system that did this in reverse: combined a large number of xml
documents into a single doc for batching. we used streaming quite
effectively. the files were in the 20MB range.

Loading them into memory caused serious problems... only streaming was a
useful thing to do.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
[quoted text, click to view]

Niklas E
3/5/2005 10:33:03 AM
Thanks for your suggestions. Are forward only xml-readers eating a lot of
memory as well? If it is a complex structure and you are using a
streamreader, I guess you have to watch the start and end tags yourself. If
the tag below the root which I want to split on is used in other places, I
guess you could run into problems.

The documents right now are about 50MB and they will grow.

BrainProgrammer:
We have been using Notepad so far, but we are tired of doing that all the
time... If you have some code, please send it. Always nice to see examples
and get ideas.

Best regards
Niklas Engfelt


[quoted text, click to view]

AddThis Social Bookmark Button