dotnet clr:
Hi, Anybody has ever needed to copy a very long array (more than Int32.MaxValue elements) on x64? I found this solution private static void LongBufferBlockCopy<T>(T[] src, long srcOffset, T[] dst, long dstOffset, long count) { if (srcOffset < Int32.MaxValue && dstOffset < Int32.MaxValue && count < Int32.MaxValue && count < Int32.MaxValue) { Buffer.BlockCopy(src, (int)srcOffset, dst, (int)dstOffset, (int)count); } else { for (long i = 0; i < count; i++) { dst.SetValue(src.GetValue(srcOffset + i), dstOffset + i); //TODO: check performance } } } Some cleaner, faster solution?
Hi Marc, I'm currently trying to implement a MemoryStream that support addressing of more than 2Gb. So I need a very large bye[] and need to copy on resize. Any idea? -- Phatsoft Inc. [quoted text, click to view] "Marc Gravell" wrote: > I'm sure there is a good reason, but perhaps an option is to use a > series of smaller arrays? In a rectangular (not jagged) formation. > Perhaps via a wrapper class with a "this[long index]" indexer that > identifies the correct array index and Int32 offset from the Int64 > index? It would be a little slower, but would have less overhead in > terms of trying to find a single block of memory that size. > > Copying would then involve 1 block-copy (or Clone()) per inner array? > > Marc > >
Sure! I'll love to see that code! You can mail me at luca@domainsbot.com. Thanks so much! L. [quoted text, click to view] "Marc Gravell" wrote: > I'm not sure that necessitates a large *single* array... you just need > the read and write methods to watch for breaks between the inner > arrays... I have done something similar when writing (for R&D purely) > a reader/writer (producer/consumer) stream, which used multiple > buffers rather than a single buffer. This approach also makes it far > more efficient to resize, as you can just expand / trim the last > buffer, without copying everything. > > Let me know if you want more of an example. > > Marc > >
Thanks for your help! -- Phatsoft Inc. [quoted text, click to view] "Marc Gravell" wrote: > i.e. (pseudocode; broad terms) > > int Read(byte[] buffer, int offset, int count) { > int totalCopied = 0; > while(count > 0 && data remaining) { > find bytesInCurrentBuffer using BUFFER_SIZE and > currentBufferOffset > int copyCount = max of count, bytesInCurrentBuffer > block copy that much to "offset" > offset += copyCount; count -= copyCount; totalCopied += copyCount > if buffer now empty, move to next buffer and reset buffer offset > to 0 > } > return totalCopied; > } > >
OK - I haven't profiled / optimised this at all - it is a little (not much) slower than MemoryStream; I'd guess that caching the current buffer (rather than going to the list each time) might help... but here's what I have: public sealed class BigMemoryStream : Stream { private long position, length; private readonly int bufferSize; private List<byte[]> buffers; private const int DEFAULT_BUFFER_SIZE = 2 ^ 20; public BigMemoryStream() : this(DEFAULT_BUFFER_SIZE, null) { } public BigMemoryStream(byte[] data) : this(DEFAULT_BUFFER_SIZE, data) { } public BigMemoryStream(int bufferSize) : this(bufferSize, null) { } public BigMemoryStream(int bufferSize, byte[] data) { if (bufferSize <= 0) throw new ArgumentOutOfRangeException("bufferSize", "bufferSize must be a positive integer"); this.bufferSize = bufferSize; if (data != null) { int blocks = data.Length / bufferSize, rem = data.Length % bufferSize; if (rem > 0) blocks++; buffers = new List<byte[]>(blocks); int sourceOffset = 0; for (int i = 0; i < blocks; i++) { byte[] blockData = new byte[bufferSize]; Buffer.BlockCopy(data, sourceOffset, blockData, 0, i == blocks - 1 ? rem : bufferSize); buffers.Add(blockData); } } else { buffers = new List<byte[]>(); } } public override bool CanRead { get { return buffers != null; } } public override bool CanWrite { get { return buffers != null; } } public override bool CanSeek { get { return buffers != null; } } public override long Length { get { return length; } } public override long Position { get { return position; } set { if (value == Position) return; if (value < 0 || value >= Length) throw new ArgumentOutOfRangeException("Position"); position = value; } } public override long Seek(long offset, SeekOrigin origin) { switch (origin) { case SeekOrigin.Begin: Position = offset; break; case SeekOrigin.End: Position = Length + offset; break; case SeekOrigin.Current: Position += offset; break; default: throw new ArgumentException("Seek operation not recognised", "origin"); } return Position; } private void CheckDisposed() { if (buffers == null) throw new ObjectDisposedException(typeof(BigMemoryStream).Name); } public override void SetLength(long value) { CheckDisposed(); int blocks = (int) (value / bufferSize), rem = (int) (value % bufferSize); if (rem > 0) blocks++; int blockIncrease = blocks - buffers.Count; if (blockIncrease > 0) { while (blockIncrease > 0) { buffers.Add(new byte[bufferSize]); blockIncrease--; } } else if (blockIncrease < 0) { buffers.RemoveRange(blocks, -blockIncrease); buffers.TrimExcess(); } bool wipeExcess = value < length; length = value; if (wipeExcess) { int wipeIndex = (int)((length % bufferSize)) + 1; if (buffers.Count > 0 && wipeIndex > 1 && wipeIndex < bufferSize) { Array.Clear(buffers[buffers.Count - 1], wipeIndex, bufferSize - wipeIndex); } } if (position > length) position = length; } public override int Read(byte[] buffer, int offset, int count) { CheckDisposed(); int totalBytes = 0; long maxRead = length - position; if (count > maxRead) count = (int) maxRead; while (position < length && count > 0) { int bufferIndex = (int) (position / bufferSize), bufferOffset = (int) (position % bufferSize); int bytes = bufferSize - bufferOffset; if (bytes > count) bytes = count; Buffer.BlockCopy(buffers[bufferIndex], bufferOffset, buffer, offset, bytes); offset += bytes; position += bytes; totalBytes += bytes; count -= bytes; } return totalBytes; } public override void Write(byte[] buffer, int offset, int count) { CheckDisposed(); if (position + count > length) SetLength(position + count); // make some space... while (count > 0) { int bufferIndex = (int) (position / bufferSize), bufferOffset = (int) (position % bufferSize); int bytes = bufferSize - bufferOffset; if (bytes > count) bytes = count; Buffer.BlockCopy(buffer, offset, buffers[bufferIndex], bufferOffset, bytes); offset += bytes; position += bytes; count -= bytes; } } protected override void Dispose(bool disposing) { if (disposing) { buffers = null; } base.Dispose(disposing); } public override void Flush() { } }
***** BY ECK! Muppet warning... 2 ^ 20... is (whoops) 22 ;-p Oddly enough, a buffer for every 22 bytes is a bit of an overhead!!! Change the line: private const int DEFAULT_BUFFER_SIZE = 1048576; And it now out-performs MemoryStream for large streams - and that is before optimisation ;-p Shame on me ;-( Marc
And one other comment... At this is well and good, and will be great when I get my 128Gb of RAM, but until then, you might give serious consideration to using a FileStream in a temp area. At the end of the day, streams are pipes, not buckets. You can carry data around in a pipe if you try hard enough, but it isn't the best tool for the job; pipes are for pumping contents through. Unless you have a glut of memory dedicated to your app, the paging from having that much data loaded in your process might outweigh the benefits of having it in-memory. Marc
I'm sure there is a good reason, but perhaps an option is to use a series of smaller arrays? In a rectangular (not jagged) formation. Perhaps via a wrapper class with a "this[long index]" indexer that identifies the correct array index and Int32 offset from the Int64 index? It would be a little slower, but would have less overhead in terms of trying to find a single block of memory that size. Copying would then involve 1 block-copy (or Clone()) per inner array? Marc
I'm not sure that necessitates a large *single* array... you just need the read and write methods to watch for breaks between the inner arrays... I have done something similar when writing (for R&D purely) a reader/writer (producer/consumer) stream, which used multiple buffers rather than a single buffer. This approach also makes it far more efficient to resize, as you can just expand / trim the last buffer, without copying everything. Let me know if you want more of an example. Marc
i.e. (pseudocode; broad terms) int Read(byte[] buffer, int offset, int count) { int totalCopied = 0; while(count > 0 && data remaining) { find bytesInCurrentBuffer using BUFFER_SIZE and currentBufferOffset int copyCount = max of count, bytesInCurrentBuffer block copy that much to "offset" offset += copyCount; count -= copyCount; totalCopied += copyCount if buffer now empty, move to next buffer and reset buffer offset to 0 } return totalCopied; }
I'll have a stab at a multi-buffer MemoryStream on the train... as for the producer/consumer stream - that may be lost to the depths of time ;-p But it wasn't that hard if you really want... the fun bit was the sync working. Marc
[quoted text, click to view] "Marc Gravell" <marc.gravell@gmail.com> wrote in message news:1180641304.151953.292250@k79g2000hse.googlegroups.com... > ***** BY ECK! > > Muppet warning... 2 ^ 20... is (whoops) 22 ;-p Oddly enough, a buffer > for every 22 bytes is a bit of an overhead!!!
Try (1 << 20) instead. [quoted text, click to view] > > Change the line: > > private const int DEFAULT_BUFFER_SIZE = 1048576; > > And it now out-performs MemoryStream for large streams - and that is > before optimisation ;-p > > Shame on me ;-( > > Marc >
[quoted text, click to view] > Try (1 << 20) instead.
Cheers - I was in a hurry when I realised my school-boy error, and didn't want to compound things by posting another error - so a literal was the obvious... The problem with dealing in multiple languages / syntax is that you forget to translate... ;-p But yes... left shift - "d'oh!" to myself... Cheers, Marc
Don't see what you're looking for? Try a search.
|