all groups > dotnet clr > april 2006 >
You're in the

dotnet clr

group:

copying stream - advise needed


copying stream - advise needed SharpCoderMP
4/25/2006 12:24:41 PM
dotnet clr: hi,

Iv'e seen two ways of copying streams in .net:

Stream outStream = MemoryStream();
Stream inStream = CreateSomeMemoryStream();
byte[] buffer = new byte[inStream.Length];
inStream.Read(buffer, 0, buffer.Length);
outStream.Write(buffer, 0, buffer.Length);

and the second one was more or less like this:

Stream outStream = MemoryStream();
Stream inStream = CreateSomeMemoryStream();
int size = 2048;
byte[] buffer = new byte[size];
while (true)
{
size = inStream.Read(buffer, 0, buffer.Length);
if (size > 0)
outStream.Write(buffer, 0, size);
else
break;
}

i wonder if there are any guidelines on this issue? should i read in a
Re: copying stream - advise needed Helge Jensen
4/25/2006 5:54:02 PM


[quoted text, click to view]

I assume your're not always reading and writing to memory-streams.

[quoted text, click to view]

That's just plain horrible, it only works on streams with known length,
and it isn't ensured to read the entire in-stream (in the general case
streams may not return as much data as asked for, even it is available).

[quoted text, click to view]

I still don't understand these memory-streams..., do you really *need* a
memory-copy of the content of inStream?

[quoted text, click to view]

This is quite a lot better. I prefer a do...while loop though.
[quoted text, click to view]

You *should* read in a loop, since Read may not return the entire
content of the stream.

Here is how I usually do it:

using System.IO;
class StreamUtil
{
public static void CopyStream(
Stream input, Stream output,
byte[] buffer, int offset, int count)
{
int read;
do
{
read = input.Read(buffer, offset, count);
if ( read > 0 )
output.Write(buffer, offset, read);
} while ( read > 0 );
}
public static void CopyStream(Stream input, Stream output, byte[] buffer)
{ CopyStream(input, output, buffer, 0, buffer.Length); }
public static void CopyStream(Stream input, Stream output, int bufferSize)
{ CopyStream(input, output, new byte[bufferSize]); }
public static void CopyStream(Stream input, Stream output)
{ CopyStream(input, output, 8*1024); }
// ... more utils here
}
class Foo
{
void DoingStuff() {
// ...
Stream input; // from somewhere
Stream output; // from somewhere
StreamUtil.CopyStream(input, output);
// ...
}
}

--
Helge Jensen
mailto:helge.jensen@slog.dk
sip:helge.jensen@slog.dk
Re: copying stream - advise needed Mattias Sjögren
4/25/2006 5:55:36 PM
[quoted text, click to view]

IMO you should use a loop. While it may not be needed when working
exclusively with MemoryStreams, it could be necessary for other type
of Streams.

That doesn't mean you have to read 2K at the time though. If the input
stream has a known length you can directly create a byte[] of the
correct size and keep reading into that. You can then construct the
output stream around that byte array with the MemoryStream(byte[])
constructor to avoid making a third copy of the data.


Mattias

--
Mattias Sjögren [C# MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Re: copying stream - advise needed Greg Young [MVP]
4/25/2006 9:18:42 PM
One thing that has yet to be mentioned as a reason why its better to use the
looping methodology is large objects ...

if you use the simple read/write methodology you end up with a byte array
the size of the data being created. If this byte size is over a certain
limit (I forget what the exact number is off the top of my head but want to
say 85k ??? someone feel free to correct me) it is treated as a large object
and put on a special heap. Large objects have differing life cycles and are
not garbage collected in the same way as normal objects .. this can lead to
a possible memory usage attack by someone forcing this operation to happen
repeatedly ... I have never tried to take an app down in this way but I
would imagine that it is feasible.

Cheers,

Greg Young


[quoted text, click to view]

Re: copying stream - advise needed SharpCoderMP
4/25/2006 11:15:52 PM
[quoted text, click to view]
yes, you're right.

[quoted text, click to view]
this was rather for the purpose of this example. usually i work with
streams produced by some classes or created from files and i cannot
predict their origin. i usually end up with something like this:
Stream myStream = SomeThirdPartyClass.GetStream();
and then i need to do some processing before i can use this stream. the
processing does some major changes to the data inside the stream
changing also it's length. If i would like to then write myStream to
file i need to create new FileStream and copy the contents of the
myStream into that newly created FileStream.

[quoted text, click to view]
yeah i know.

[quoted text, click to view]
ok. thanks for the info.


[quoted text, click to view]
Re: copying stream - advise needed Helge Jensen
4/26/2006 12:00:00 AM


[quoted text, click to view]

The looping strategy is required for semantic reasons. A stream may
return less bytes than requested, even if the stream is not closed yet.

invoking s.Read(buf, 0, bug.Length) may return at any time there is any
data available, or the stream is closed.

In the real world not-looping will often break with for example
network-streams, where data arrive "slowly".

[quoted text, click to view]

Creating a large amount of large objects is also not good, but it's not
semanticly required.

[quoted text, click to view]

That can most certainly be done :) I have seen code (in effect) like:

while ( true ) {
using ( MemoryStream s = new MemoryStream() )
using ( Stream connection = awaitConnectionAndGetStream() ) {
// workaround Deserialize invoke s.Close()
Util.StreamCopy(connection, s);
try {
object o = formatter.Deserialize(s);
}
/// process and send reply
}
}

Conversing with this server for any significant amount of time,
transmitting a few objects of 85k the .NET runtime would break down with
an out-of-memory error.

If you expect (or risk receiving) large amounts of input, you should
process the data from that stream in a streaming manner, that is -- a
"little" at a time.

In the end i rewrote the above-code to use pass a proxy for the
connection to Deserialize. That proxy workarounds the Close, and some
other issues and it allows the caller to limit the amount of data that
can be read from the proxy.

--
Helge Jensen
mailto:helge.jensen@slog.dk
sip:helge.jensen@slog.dk
Re: copying stream - advise needed Greg Young [MVP]
4/26/2006 12:00:00 AM
I understand it is required. I was listing the large objects as an
additional benefit :)

"> One thing that has yet to be mentioned as a reason why its better to use
the
[quoted text, click to view]

That is to say "in addition to" the other reasons mentioned.

Cheers,

Greg
[quoted text, click to view]

Re: copying stream - advise needed Helge Jensen
4/26/2006 12:00:00 AM


[quoted text, click to view]

Okay, I see now.

<splitting hairs>
[quoted text, click to view]

What happened was; I don't really see that there is any choice in the
matter, looping is required for correctness, not for optimization. That
made me choke a bit on the "conditional" way your reply was formulated:

"why *it's better* to use the looping..." [My emphasis]

It would have went down a treat as something along the lines of:

"In addition to the semantics/correcteness issues of not-looping,
there is the issue of large objects ..."

</splitting hairs>

--
Helge Jensen
mailto:helge.jensen@slog.dk
sip:helge.jensen@slog.dk
AddThis Social Bookmark Button