Hi All. I have a small test application (a condensation of the issue from a much larger application) that recurses a directory and adds all the file names to a string collection. The app then iterates over these names, opens the file, and reads the entire file (or the first 4Mb if greater than 4Mb), and closes the file. The problem is that after 10000 or so files, the application slows down significantly. I've played with various GC and Thread.Sleep() calls as part of the loop, but the slowdown still occurs. There are 29000 files or so, in a couple hundred directories, and all the files are 100-200k in size. Can someone explain a way to fix the issue? I'm including the source to the app and some sample output. This particular code is for C# 1.1, but the same issue exists for 2.0 RTM. Thanks, m ----------------------------- using System; using System.Collections.Specialized; using System.IO; namespace FileReadTest { internal class Class1 { const string vDir = "<SOME DIR HERE>"; [STAThread] private static void Main(string[] args) { DateTime sStartTime = DateTime.Now; StringCollection sDiskFiles = new StringCollection(); RecurseDirs(new DirectoryInfo(vDir), sDiskFiles); byte[] sBuf = new byte[1024*1024*4]; int x = 0; foreach (string sFileName in sDiskFiles) { if (x++%256 == 1) Console.WriteLine("Processing {0,6:#,##0} of {1,6:#,##0} Elapsed: {2}", x, sDiskFiles.Count, (DateTime.Now-sStartTime)); using (FileStream sStream = File.OpenRead(sFileName)) { sStream.Read(sBuf, 0, sBuf.Length); sStream.Close(); } } } private static void RecurseDirs(DirectoryInfo aDir, StringCollection aList) { DirectoryInfo[] sDirArr = aDir.GetDirectories(); for (int x = 0; x < sDirArr.Length; x++) { RecurseDirs(sDirArr[x], aList); } FileInfo[] sFileArr = aDir.GetFiles(); for (int x = 0; x < sFileArr.Length; x++) { FileInfo sInfo = sFileArr[x]; aList.Add(sInfo.FullName); } } } } ----------------------------- Processing 1 of 29,041 Elapsed: 00:00:17.3281250 Processing 257 of 29,041 Elapsed: 00:00:18.5312500 << about 1.2s per 250 files. Processing 513 of 29,041 Elapsed: 00:00:19.9531250 Processing 769 of 29,041 Elapsed: 00:00:21.1562500 Processing 1,025 of 29,041 Elapsed: 00:00:22.5937500 Processing 1,281 of 29,041 Elapsed: 00:00:23.6718750 Processing 1,537 of 29,041 Elapsed: 00:00:24.8750000 Processing 1,793 of 29,041 Elapsed: 00:00:25.9843750 Processing 2,049 of 29,041 Elapsed: 00:00:27.3906250 Processing 2,305 of 29,041 Elapsed: 00:00:28.6718750 <snip> Processing 12,545 of 29,041 Elapsed: 00:01:55.1875000 Processing 12,801 of 29,041 Elapsed: 00:01:58.9218750 Processing 13,057 of 29,041 Elapsed: 00:02:02.9687500 Processing 13,313 of 29,041 Elapsed: 00:02:06.7968750 Processing 13,569 of 29,041 Elapsed: 00:02:10.7812500 Processing 13,825 of 29,041 Elapsed: 00:02:14.7343750 Processing 14,081 of 29,041 Elapsed: 00:02:18.6406250 Processing 14,337 of 29,041 Elapsed: 00:02:22.2968750 Processing 14,593 of 29,041 Elapsed: 00:02:25.9687500 << about 3.7s per 250 files <snip> Processing 20,225 of 29,041 Elapsed: 00:04:19.5156250 Processing 20,481 of 29,041 Elapsed: 00:04:26.4531250 Processing 20,737 of 29,041 Elapsed: 00:04:33.4375000 Processing 20,993 of 29,041 Elapsed: 00:04:39.3593750 Processing 21,249 of 29,041 Elapsed: 00:04:47.6562500 Processing 21,505 of 29,041 Elapsed: 00:04:56.9843750 Processing 21,761 of 29,041 Elapsed: 00:05:07.4062500 Processing 22,017 of 29,041 Elapsed: 00:05:15.7812500 << about 8.3s per 250 files <snip> Processing 27,649 of 29,041 Elapsed: 00:07:20.1250000 Processing 27,905 of 29,041 Elapsed: 00:07:24.6093750 Processing 28,161 of 29,041 Elapsed: 00:07:32.8281250 Processing 28,417 of 29,041 Elapsed: 00:07:41.6406250 << about 8.8s per 250 files.
[quoted text, click to view] "Mark Rochotte" wrote: > Hi All. > > I have a small test application (a condensation of the issue from a much > larger application) that recurses a directory and adds all the file names > to a string collection. The app then iterates over these names, opens the > file, and reads the entire file (or the first 4Mb if greater than 4Mb), and > closes the file. > > The problem is that after 10000 or so files, the application slows down significantly. > I've played with various GC and Thread.Sleep() calls as part of the loop, > but the slowdown still occurs. > > There are 29000 files or so, in a couple hundred directories, and all the > files are 100-200k in size. > > Can someone explain a way to fix the issue? I'm including the source to > the app and some sample output. This particular code is for C# 1.1, but > the same issue exists for 2.0 RTM. > > Thanks, > > m > > ----------------------------- > using System; > using System.Collections.Specialized; > using System.IO; > > namespace FileReadTest > { > internal class Class1 > { > const string vDir = "<SOME DIR HERE>"; > > [STAThread] > private static void Main(string[] args) > { > DateTime sStartTime = DateTime.Now; > > StringCollection sDiskFiles = new StringCollection(); > RecurseDirs(new DirectoryInfo(vDir), sDiskFiles); > byte[] sBuf = new byte[1024*1024*4]; > > int x = 0; > foreach (string sFileName in sDiskFiles) > { > if (x++%256 == 1) > Console.WriteLine("Processing {0,6:#,##0} of {1,6:#,##0} > Elapsed: {2}", x, sDiskFiles.Count, (DateTime.Now-sStartTime)); > > using (FileStream sStream = File.OpenRead(sFileName)) > { > sStream.Read(sBuf, 0, sBuf.Length); > sStream.Close(); > } > } > > } > > private static void RecurseDirs(DirectoryInfo aDir, StringCollection > aList) > { > DirectoryInfo[] sDirArr = aDir.GetDirectories(); > for (int x = 0; x < sDirArr.Length; x++) > { > RecurseDirs(sDirArr[x], aList); > } > FileInfo[] sFileArr = aDir.GetFiles(); > for (int x = 0; x < sFileArr.Length; x++) > { > FileInfo sInfo = sFileArr[x]; > aList.Add(sInfo.FullName); > } > } > > } > } > ----------------------------- > Processing 1 of 29,041 Elapsed: 00:00:17.3281250 > Processing 257 of 29,041 Elapsed: 00:00:18.5312500 << about 1.2s per > 250 files. > Processing 513 of 29,041 Elapsed: 00:00:19.9531250 > Processing 769 of 29,041 Elapsed: 00:00:21.1562500 > Processing 1,025 of 29,041 Elapsed: 00:00:22.5937500 > Processing 1,281 of 29,041 Elapsed: 00:00:23.6718750 > Processing 1,537 of 29,041 Elapsed: 00:00:24.8750000 > Processing 1,793 of 29,041 Elapsed: 00:00:25.9843750 > Processing 2,049 of 29,041 Elapsed: 00:00:27.3906250 > Processing 2,305 of 29,041 Elapsed: 00:00:28.6718750 > <snip> > Processing 12,545 of 29,041 Elapsed: 00:01:55.1875000 > Processing 12,801 of 29,041 Elapsed: 00:01:58.9218750 > Processing 13,057 of 29,041 Elapsed: 00:02:02.9687500 > Processing 13,313 of 29,041 Elapsed: 00:02:06.7968750 > Processing 13,569 of 29,041 Elapsed: 00:02:10.7812500 > Processing 13,825 of 29,041 Elapsed: 00:02:14.7343750 > Processing 14,081 of 29,041 Elapsed: 00:02:18.6406250 > Processing 14,337 of 29,041 Elapsed: 00:02:22.2968750 > Processing 14,593 of 29,041 Elapsed: 00:02:25.9687500 << about 3.7s per > 250 files > <snip> > Processing 20,225 of 29,041 Elapsed: 00:04:19.5156250 > Processing 20,481 of 29,041 Elapsed: 00:04:26.4531250 > Processing 20,737 of 29,041 Elapsed: 00:04:33.4375000 > Processing 20,993 of 29,041 Elapsed: 00:04:39.3593750 > Processing 21,249 of 29,041 Elapsed: 00:04:47.6562500 > Processing 21,505 of 29,041 Elapsed: 00:04:56.9843750 > Processing 21,761 of 29,041 Elapsed: 00:05:07.4062500 > Processing 22,017 of 29,041 Elapsed: 00:05:15.7812500 << about 8.3s per > 250 files > <snip> > Processing 27,649 of 29,041 Elapsed: 00:07:20.1250000 > Processing 27,905 of 29,041 Elapsed: 00:07:24.6093750 > Processing 28,161 of 29,041 Elapsed: 00:07:32.8281250 > Processing 28,417 of 29,041 Elapsed: 00:07:41.6406250 << about 8.8s per > 250 files. > > >
[quoted text, click to view] > Processing 1 of 29,041 Elapsed: 00:00:17.3281250 > Processing 257 of 29,041 Elapsed: 00:00:18.5312500 << about 1.2s per .... > Processing 28,417 of 29,041 Elapsed: 00:07:41.6406250 << about 8.8s per > 250 files.
By my reckoning, you have about 17s of startup, and I agree with 1.2s for the first batch. The last batch took 7*60+41-17 sec == 444s which yields about 3.9s per 250 files, not 8.8s. Your 29,000 filenames could take several megabytes of storage in sDiskFiles. You access it sequentially over several minutes. Maybe you are being hit by paging. One possibility is to read the files as you find them in RecurseDirs. You will still need the large sBuf, but not the large sDiskFiles. BTW, if the files are a few hundred k max, why is sBuf a few megs? Finally, I think foreach in C# is slow-ish, and an alternative loop construct may help. I use VB, so this is just a guess.
[quoted text, click to view] AMercer wrote:
<snip> [quoted text, click to view] > Finally, I think foreach in C# is slow-ish, and an alternative loop > construct may help. I use VB, so this is just a guess.
In some situations foreach is slower than other loop constructs - but it'll only when nothing of any significance is going on in the loop. Don't forget that it's only looping 29,000 times here. Based on a quick test on my machine, the loop itself may be contributing about a millisecond to the total time - in other words, about 0.0002% of the total time taken. Jon
[quoted text, click to view] Mark Rochotte wrote: > I have a small test application (a condensation of the issue from a much > larger application) that recurses a directory and adds all the file names > to a string collection. The app then iterates over these names, opens the > file, and reads the entire file (or the first 4Mb if greater than 4Mb), and > closes the file. > > The problem is that after 10000 or so files, the application slows down significantly. > I've played with various GC and Thread.Sleep() calls as part of the loop, > but the slowdown still occurs.
I very much doubt that garbage collection is relevant here. A couple of things which might be going on: 1) Depending on how the files were written out, it's possible that the earlier files are less fragmented than the later files. 2) Depending on what you'd been doing with the system before running the test, it's possible that the earlier files were cached. 3) Other system activity (virus checking etc) could have had a significant impact It's also possible that the files you read later happened to be the bigger ones. I suggest you repeat the test: 1) Defragment your disk 2) Write out the files with random content, but all the same size 3) Reboot 4) Temporarily disable any virus checkers (preferrably having disconnected from any network) 5) Run the test (with nothing else running at the same time) Jon
Don't see what you're looking for? Try a search.
|