Groups | Blog | Home
all groups > c# > february 2007 >

c# : string to byte[] back to string + Compression Failed!


jeremyje NO[at]SPAM gmail.com
2/7/2007 5:35:46 PM
I'm writing some code that will convert a regular string to a byte[]
for compression and then beable to convert that compressed string back
into original form.

Conceptually I have....

For compression
string ->(Unicode Conversion) byte[] -> (Compression + Unicode
Conversion) string

For Decompression
string ->(Unicode Conversion) byte[] -> (DECompression + Unicode
Conversion) string

The problem is that there's a code chunk that fails. Probably because
of some bad conversion somewhere in my code.



The key line that is constantly failing is....
int size = s.Read(write_data, 0, 8);
GZip algorithm gives me a ArrayIndexOutOfBounds
Deflate gives me some data corruption error. I looked at the byte[]
right after compression and right before decompression and they DO NOT
MATCH! What is the problem in this situation?

I need this process such that I get these 2 functions...

string CompressString(string in, Algorithm.GZip or Algorithm.Deflate);
string DecompressString(string in, Algorithm.GZip or
Algorithm.Deflate);

I've seen similar code but no potential fixes on Google.com


My Code is below....






using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.IO.Compression;

namespace Jeremyje.Utility
{
public class StringTransforms
{
public enum StringCompressionAlgorithm
{
GZip,
Deflate
}

public static byte[] UnicodeStringToByteArray(string str)
{
UnicodeEncoding enc = new UnicodeEncoding();
return enc.GetBytes(str);
}

public static string ByteArrayToUnicodeString(byte[] str_arr)
{
UnicodeEncoding enc = new UnicodeEncoding();
return enc.GetString(str_arr);
}

public static bool DecompressString(string in_string, out
string out_string)
{
return DecompressString(in_string, out out_string,
StringCompressionAlgorithm.GZip);
}

public static bool DecompressString(string in_string, out
string out_string, StringCompressionAlgorithm alg)
{
bool status = false;
out_string = in_string;

switch (alg)
{
case StringCompressionAlgorithm.GZip:
{
try
{
out_string = "";
int total_length = 0;
byte[] write_data = new byte[4096];
byte[] bData =
UnicodeStringToByteArray(in_string);

GZipStream s = new GZipStream(new
MemoryStream(bData), CompressionMode.Decompress);

while (true)
{
int size = s.Read(write_data, 0, 8);
if (size > 0)
{
total_length += size;
out_string +=
Encoding.Unicode.GetString(write_data, 0, size);
}
else
{
break;
}
}
s.Close();
status = true;
}
catch (Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}
case StringCompressionAlgorithm.Deflate:
{
try
{
out_string = "";
int total_length = 0;
byte[] write_data = new byte[4096];
byte[] bData =
UnicodeStringToByteArray(in_string);

DeflateStream s = new DeflateStream(new
MemoryStream(bData), CompressionMode.Decompress);

while (true)
{
int size = s.Read(write_data, 0, 8);
if (size > 0)
{
total_length += size;
out_string +=
Encoding.Unicode.GetString(write_data, 0, size);
}
else
{
break;
}
}
s.Close();
status = true;
}
catch (Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}

default:
break;
}

return status;
}

public static bool CompressString(string in_string, out string
out_string)
{
return CompressString(in_string, out out_string,
StringCompressionAlgorithm.GZip);
}

public static bool CompressString(string in_string, out string
out_string, StringCompressionAlgorithm alg)
{
bool status = false;
out_string = in_string;

switch(alg)
{
case StringCompressionAlgorithm.GZip:
{
try
{
MemoryStream ms = new MemoryStream();
Stream s = new GZipStream(ms,
CompressionMode.Compress);
byte[] bData =
UnicodeStringToByteArray(in_string);

s.Write(bData, 0, bData.Length);
s.Close();
byte[] compressed_data =
(byte[])ms.ToArray();
out_string =
ByteArrayToUnicodeString(compressed_data);
status = true;
}
catch(Exception e)
{
Console.WriteLine(e);
status = false;
}

return status;
}
case StringCompressionAlgorithm.Deflate:
{
try
{
MemoryStream ms = new MemoryStream();
Jon Skeet [C# MVP]
2/8/2007 6:55:36 AM
[quoted text, click to view]

Don't try to encode arbitrary binary data as a string directly using
Encoding.Unicode. As Morten suggested, use Base64 instead.

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
Morten Wennevik [C# MVP]
2/8/2007 7:43:46 AM
Hi Jeremy,

Your problem is converting the compressed byte[] to string. After the =

compression a string can't hold the data the byte[] holds and you lose =

lots of data causing an exception when you try to decompress it.

Having Compress return a byte[] and Decompress take a byte[] will solve =
=

your problem. If you need the byte[] to be represented as string you ca=
n =

use Base64.

[DecompressString]
byte[] bData =3D Convert.FromBase64String(in_string);

[CompressString]
out_string =3D Convert.ToBase64String(compressed_data);


-- =

Happy Coding!
jeremyje NO[at]SPAM gmail.com
2/9/2007 4:57:27 PM
[quoted text, click to view]

Yeah, that's what I ended up doing but that's not much of a
compression since base64 makes the data a lot larger. Is there another
encoding other than base64 that will yield better results?
Jon Skeet [C# MVP]
2/10/2007 7:17:37 AM
[quoted text, click to view]

Not with the same degree of safety. If you could store the raw
compressed data instead of converting it back to a string, you'd be
okay - but to convert arbitrary binary data into text data which is
"safe" in many situations (i.e. won't be subject to unicode
normalization, can be expressed in many encodings etc) Base64 is a very
good choice.

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
Arne_Vajhøj
2/10/2007 2:48:39 PM
[quoted text, click to view]

Nothing standard.

Using a home made Base128 together with using a single byte
encoding like ISO8859-1 will reduce the overhead slightly.

AddThis Social Bookmark Button