Groups | Blog | Home
all groups > dotnet xml > may 2004 >

dotnet xml : Array Bounds Exception Inside system.xml.dll



William McIlroy
5/13/2004 6:41:05 PM
Array Bounds Exception inside system.xml.dll. Test data is a dozen GB (available for the asking on CD). Source code follows. Call into system.xml.dll happens at the while statement..

using System
using System.Xml
using System.Collections

// This program reads an ASCII file of XML elements
// The output is a list of unique NODE TYPEs
// For example, <head> produces head in the output
// There is no validation of the XML

namespace xmlReaderAp

/// <summary
/// Summary description for Class1
/// </summary
class Class

/// <summary
/// The main entry point for the application
/// </summary
[STAThread

static void Main(string[] args

int symbolNumber=0
ArrayList arrayList = new ArrayList()
XmlTextReader xmlReader = new XmlTextReader(@"c:\1.xml")
tr

xmlReader.MoveToContent(); // Get beyond the DTD which we don't care abou

catch (Exception e

Console.WriteLine("Exception: " + e.Message); // handle undefined DT

try

while (xmlReader.Read())

symbolNumber++
if (symbolNumber % 100000 == 0
Console.WriteLine(symbolNumber)
switch (xmlReader.NodeType)

case XmlNodeType.Element
// Console.Write("<{0}>", xmlReader.Name)
bool bDoAdd = true
foreach (String memberofArrayList in arrayList
if (memberofArrayList.Equals(xmlReader.Name)

bDoAdd = false
break

if (bDoAdd
arrayList.Add(xmlReader.Name)
break
case XmlNodeType.Text
// Console.Write(xmlReader.Value)
break
case XmlNodeType.CDATA
// Console.Write("<![CDATA[{0}]]>", xmlReader.Value)
break
case XmlNodeType.ProcessingInstruction
// Console.Write("<?{0} {1}?>", xmlReader.Name, xmlReader.Value)
break
case XmlNodeType.Comment
// Console.Write("<!--{0}-->", xmlReader.Value)
break
case XmlNodeType.XmlDeclaration
// Console.Write("<?xml version='1.0'?>")
break
case XmlNodeType.Document
break
case XmlNodeType.DocumentType
// Console.Write("<!DOCTYPE {0} [{1}]", xmlReader.Name, xmlReader.Value)
break
case XmlNodeType.EntityReference
// Console.Write(xmlReader.Name)
break
case XmlNodeType.EndElement
// Console.Write("</{0}>", xmlReader.Name)
break



xmlReader.Close()
foreach (String memberofArrayList in arrayList
Console.WriteLine(memberofArrayList)
Console.WriteLine("This program has terminated.")

catch(Exception e)

xmlReader.Close()
Console.WriteLine("Exception: " + e.Message)
Console.WriteLine(e.StackTrace)
Console.WriteLine(e.Source)
foreach (String memberofArrayList in arrayList
Console.WriteLine(memberofArrayList)
Console.WriteLine("This program has terminated.")
return





Oleg Tkachenko [MVP]
5/14/2004 9:48:59 AM
[quoted text, click to view]

Dozen GB XML document???? Holy cow! :)
Provide more info please: .NET version, exact exception message along
with full stack trace.

Should note also - the following loop:

foreach (String memberofArrayList in arrayList)
if (memberofArrayList.Equals(xmlReader.Name))

is a way ineffective, you can get huge raise in performance using
Hashtable instead of ArrayList.

--
Oleg Tkachenko [XML MVP, XmlInsider]
William McIlroy
5/14/2004 10:06:03 AM
Exception: Index was outside the bounds of the array
at System.Xml.XmlTextReader.SetElementValues(
at System.Xml.XmlTextReader.ParseElement(
at System.Xml.XmlTextReader.Read(
at xmlReaderApp.Class1.Main(String[] args

A first chance exception of type 'System.IndexOutOfRangeException' occurred in system.xml.dl

Additional information: Index was outside the bounds of the array

1.1.4322 VERSION of .NE

Since I wrote the program for myself, I don't care about performance. It is fast enough for me
You heard correctly: 11 gigabytes. It is generated by a National Institutes of Health conversion program
But thanks for the advice
Oleg Tkachenko [MVP]
5/16/2004 11:20:15 AM
[quoted text, click to view]

Apparently 2Gb is the limitation of the XmlTextReader. See
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=10f4801c25a6f%24eea2c6b0%2435ef2ecf%40TKMSFTNGXA11&rnum=1&prev=/groups%3Fq%3DSystem.Xml.XmlTextReader.SetElementValues%2Bgroup:microsoft.public.dotnet.xml%26hl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dmicrosoft.public.dotnet.xml%26selm%3D10f4801c25a6f%2524eea2c6b0%252435ef2ecf%2540TKMSFTNGXA11%26rnum%3D1

--
Oleg Tkachenko [XML MVP]
William McIlroy
5/16/2004 7:46:04 PM
AddThis Social Bookmark Button