Groups | Blog | Home
all groups > dotnet performance > november 2003 >

dotnet performance : What would be a faster alternative to boxing/unboxing?


Eric Gunnerson [MS]
11/20/2003 12:58:25 PM
Mountain (can I call you Mountain...),

The overhead from boxing should be roughly equivalent to the overhead of
creating the wrapper class, as they're doing nearly exactly the same thing.
I'm not sure how to contrast the overhead of the two options in #2 - my
guess is that virtual dispatch is roughly as expensive as an unbox, but if
you do the virtual every time and the unbox only on value types, it could
explain your results.

You should also probably spend some time using the CLR profiler to examine
your application.

http://www.microsoft.com/downloads/details.aspx?FamilyId=86CE6052-D7F4-4AEB-9B7A-94635BEEBDDA&displaylang=en




--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
[quoted text, click to view]

100
11/20/2003 1:19:09 PM
Hi Paul,
[quoted text, click to view]
Not quite true.
If you have

struct MyStruct
{
......
}

MyStruct [] arr = new MyStruct[XXX];
arr[0] = new MyStruct() ; //that won't cause any boxing.
MyStruct ms = arr[0]; //won't cause any unboxing because nothing has been
boxed

Anyway, it might not cause boxing, but the value will be copied which will
introduce some overhead compared to the case if we use reference type
instead of value type.

B\rgds
100

100
11/20/2003 2:13:13 PM
Hi,
What do you mean by primitives? Do you mean real primitives like (int,
float, char, etc) or you are talking about value types in general (which
could be primitives as well as your own defined structures)?

[quoted text, click to view]

Wrapping primitives as long as I can see is a process of:

1. Creating an reference type calling its constructor, which I assume accept
the primitives as a parameter. This constructor then saves the primitive in
internal data member of the same type as the primitive (to avoid boxing)
Now let see how many times we copy our primitive: once calling the
constuctor we are making a copy in the stack, twise when we copy it in the
internal variable.

I asked if your promitives are real primitives because if they are they are
not so big. Anyway, even if you use reference types you have to copy the
reference (address - 4 bytes) which is not a big difference. In addition
the JITer will optimize the calls using the CPU registers. If you have
structures, though, it might be too big to be optimized and they will be
indeed copied. You will end up with two copy operation + all work done to
create the wrapping object + calling the constructor. If you have used
boxing you would have - creating object in the heap + one copy (no calling
constructors).

2. Then when you read data again you will have at leas one copy operation of
the primitive. As far as I understand you use virtual method to retrieve the
data so the method is too general to be used for all primitives you may
have. So type conversion(which means boxing + unboxing) is necessary. Again
only one unboxing is better.

It is not surprising (at least to me) that the wrapping classes are slower
than boxing/unboxing

" 2. wrapping each primitive forced us to introduce a new virtual method
call
[quoted text, click to view]

I don't thing with virtual method you can avoid boxing/casting. IMHO this
will lead to boxing/casting
May be I am wrong. A code snipped will be of help here.
Anyway don't worry about the performance of virtual methods. In the way CLR
works the difference is not as big as in C++ for example. Even non-virtual
methods are called indirectly. Virtual methods need only one step of
indirection more + it doesn't adjust the pointer as C++ does. The diference
is not as big as if you compare to C++.


[quoted text, click to view]

Strongly-typed and heterogenous? IMHO those two guys are mutullay exclusive.

B\rgds
100

100
11/20/2003 4:51:56 PM
Hi Eric,

I agree that using wrapping classes is probably as slow (even it could be
slower) than boxing/unboxing. And I gave my point in my prevoius post.
However I'm not qiute sure what you are saying here
[quoted text, click to view]

In .NET dispatching of virtual-method calls is pretty simple and doesn't
take much work than the calls of non-virtual methods. So if what you are
saying is true it means that every function calls is as expensive as unbox
operation.

B\rgds
100

[quoted text, click to view]

100
11/20/2003 5:30:46 PM
Hi,
[quoted text, click to view]

Don't get me wrong. I'm not arguing against creativity. I just wanted to say
that heterogenous storage means (as far as I understand that word) storage
that contains different types of data. Strongly-typed would mean (in most of
the casses) storage that contains only one type of data and doesn't accept
any other.

But in my second thought it could be called like that if you have storage
that accept only certain types of data (say integers, floats and my Foo
class) and reject the others.

So if that is what you mean. I see one posible solution which may satisfy
your needs of speed.
For each value type you may have designated array to keep the values. In
this case you won't have boxing/unboxing operations going behind the scene.
Reference types you may hold in one common array (or any other kind of
storage).

Provide as many overloads of the methods for adding and retrieving data as
many different types you have

something like this (suppose we want to accept int, double and strings(this
is my reference type))

class Storage
{
int[] intArr= new int[10];
double[] dblArr = new int[10];
object[] refArr = new object[10];

public int this[int i]
{
get{return intArr[i];}
set{intArr[i] = value;}
}

public double this[int i]
{
get{return dblArr[i];}
set{dblArr[i] = value;}
}

public string this[int i]
{
get{return (string)refArr[i];} //Conversion - it is not as bad as
boxing/unboxing
set{refArr[i] = value;}
}

.......
}
Of course we have to implement logic to expand the storage when the capacity
is exceeded and so on.
We waste too much memory; to be more concrete you'll have N*M unused slots
where N is the number of elemens in the storage and M is the number of the
value types which can be stored. It could be huge amount of unused memory if
you are planing to store a lot of data, but for small quantities it could be
acceptable and we won't have any boxing/unboxing.

B\rgds
100

Mountain Bikn' Guy
11/20/2003 5:34:50 PM
I have a situation where an app writes data of various types (primitives and
objects) into a single dimensional array of objects. (This array eventually
becomes a row in a data table, but that's another story.) The data is
written once and then read many times. Each primitive read requires
unboxing. The data reads are critical to overall app performance. In the
hopes of improving performance, we have tried to find a way to avoid the
unboxing and casting. So far, nothing has worked well.

One solution we tried was to create our own simple object to wrap the
primitives. In this object we used a public field that allowed reading of
the primitive data value without a property or unboxing. However, overall
performance went down. We are still investigating, but our current
speculations are that:
1. the overhead of creating the object was significantly greater than the
overhead of boxing and this offset any gains we might have gotten on the
reads (a guess).
2. wrapping each primitive forced us to introduce a new virtual method call
in place of the boxing/casting. Maybe the overhead of this call is what
caused the performance to go down. (Another guess.)

BTW, performance went down 15%.

Does anyone have any ideas on how to implement the following:

A 1-dimension array (with normal indexing) containing heterogenous strongly
typed data that allows reading/writing without casting and without
boxing/unboxing.

All thoughts/suggestions are appreciated.



Paul Robson
11/20/2003 5:47:56 PM
[quoted text, click to view]
Have to excuse me as I'm new to C# :)

My understanding is you can't have an array of objects with value types
without boxing. As the instant you do obArray[x] = 32 it boxes it.

Can you use a second array in parallel with your first array ; casting
data if you need to ; this might be quicker. Wasteful of storage :)

Other thing is to hack into the system and write inline code. Don't know
if you can do this in C# even if it is a good idea.

Sorry if these ideas are dumb :)

Mountain Bikn' Guy
11/20/2003 6:34:57 PM
This suggestion is almost exactly what I referred to in my original posting.
Don't know why, but it was slower than boxing/unboxing. Copying isn't an
issue because I'm dealing with value types (primitives) which will be copied
in any case -- and this is the behavior I desire.

[quoted text, click to view]

Mountain Bikn' Guy
11/20/2003 8:56:50 PM
Thanks for all your tips and thoughts. Your comments helped me see why some
alternatives we tried were slower than boxing/unboxing.

More responses inline.

[quoted text, click to view]

Yes, but that doesn't mean some creative thinking can't result in a
satisfactory solution. When I was in college, I created a data structure
that allows (psuedo) random access into dynamically allocated memory without
any external indexes. My goal was to have array style random access into
dynamically allocated (non-contiguous) memory -- two goals that seem
incompatible. The result of my effort was a data structure that
out-performed height-balanced AVL trees. So I'm hoping to have some luck and
come up with something creative that lets me get the benefits of strong
typing in a heterogenous array. ... maybe it will happen, maybe not.

Mountain Bikn' Guy
11/20/2003 10:33:07 PM
Hi Eric,
You can certainly call me Mountain ;) I have a small mountain bike track
outside my back door (it's more like a BMX track). About the only time I
leave the computer is when I go ride my mountain bike (or when my wife
forces me to go to sleep).

Thanks for your input and the link to the CLR profiler. I think using the
CLR profiler is a good next step.
Regards,
Dave

[quoted text, click to view]

Paul Robson
11/21/2003 9:24:32 AM
[quoted text, click to view]

Hm. So structs are this sort of wierd interim type between values and
objects ? I find the whole thing not confusing but maybe inconsistent.

If you can assign a struct to an object reference what happens if you have

sometype somefunc(object[] a)
{
a[1] = new MyStruct()
}

My understanding is that MyStruct[] is allocated off the stack not the
heap. So when the stack frame is removed, what does a[1] point to ?

It doesn't seem very "safe" - unless it does a struct copy (as in
struct1=struct2) and allocates that on the heap ?

Sorry if these are dumb questions :) I only started C# a couple of days ago.

Paul Robson
11/21/2003 9:26:35 AM
[quoted text, click to view]

I thought that C# couldn't do polymorphism on return types only ? Am I
wrong ?
100
11/21/2003 10:12:10 AM
Hi Paul,

No, the question is not dumb at all.

[quoted text, click to view]

No, the struct are value types. If you look at the docs value types are:
primitives (int32, int64, char, etc), enums and structures. So take a look
at MSDN. Int32 for example is decalred as a *struct*. They are called
primitives because IL has special instructions to work with them directly.
That is why you won't see any overloads of the arithmetic operators defined
for them. Decimal, though, is not a primitive so it has overloaded the
arithmetic operators and this is the reason why the performance is worse
when you work with them compared with double for example.

[quoted text, click to view]

Value types may be in two states: boxed and unboxed, how you know. When they
are in boxed state they are always in the managed heap. When they are in
unboxed state they might be in the heap or in the stack. In c# we can have
unboxed value type in the heap only when they are part of the reference
object (which is allocated always in the heap). I said in C# because for
CLR's point of view the operation of unboxing doesn't do the copy of the
value type from the heap to the stack. You might have boxed value type in
the heap and then doing *unbox* it gets a pointer to the data part of the
object (pointer to the unboxed object) and pushed it in the evaluation
stack. It doesn't copy the data from the heap. However, because in most of
the cases the opertation of unboxing is followed by operatation of copying
the data C# designers desided not to provide unboxing in the heap. In C#
unboxing is always followed by copy. You can do it in VC++ or ILAsm, though.

So arrays are reference types. As reference types they reside in the heap.
But if they are of type *value type* the items are in unboxed state in the
heap as if they were data members of a class.
So, if you have

int[] intArr = new int[10];
intArr[0] = 10;
the value type 10 won't be boxed in the array it will be copied in the heap
in unboxed state.

What happens here
[quoted text, click to view]

new MyStruct will create indeed the strucutre in the stack. But then the
assignment will copy this value form the stack to the heap. The original one
which is in the stack will go away when the stack frame gets removed but its
verbatim copy in the heap will stay as part of the array.

HTH
B\rgds
100

100
11/21/2003 10:29:07 AM
Hi Paul,
You are hundred percent right. That was my fault. I wrote this minutes
before I went home yesterday and its my fault that I posted this without any
revision.

Thanks for bringing this out. We can't use indexer for that. What we can do
is to use separate operations SetData and GetData with as many overloads as
different types we want to accept. GetData cannot return value because ot
will introduce the same problem that we have already with the indexer. So we
can have prototype like

void GetData(int index, out <type> value)
{
}

this should fix the problem.

B\rgds
100

"Paul Robson" <autismuk@autismuk.muralichucks.freeserve.co.uk> wrote in
message news:bpkloa$6f6$2@news8.svr.pol.co.uk...
[quoted text, click to view]

100
11/21/2003 11:50:58 AM
The real problem with this idea is - how we can possibly know which type is
the object at given index in order to know what overload to use to retrieve
the data. We can provide method that returns the type of an object by index
value. But then we need to check this type and call the appropriate method
which will introduce more overhead when reading data. It may turns that
boxing/unboxing is the best solution and the problem with the prformance is
not there.

That's why I think we can't come up with good idea unless Mountain gives us
an example how he is planning to use this storage. He may not need to have
the objects ordered and then this solution will work and we won't have the
problem of wasting the memory.

B\rgds
100

[quoted text, click to view]

Eric Gunnerson [MS]
11/21/2003 12:48:19 PM
Well, it is just a guess, but here's my reasoning.

The virtual dispatch by itself isn't a big deal - it's only an extra few
instructions. But because it's virtual (and there could be derived classes
loaded later), the JIT doesn't inline virtual calls, so you do pay a
non-trivial penalty there.

Unbox, on the other hand, is a runtime type check (fairly cheap), and then a
copy of memory from the reference location to the stack location. That's
pretty cheap, and my guess is that it's cheaper than the lost inline

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
[quoted text, click to view]

Paul Robson
11/21/2003 4:02:23 PM
[quoted text, click to view]

Thanks ! It's much clearer now :)
100
11/21/2003 4:34:42 PM
Hi Eric,

[quoted text, click to view]

Pretty wild guess I think ;)

1. There is no guarantee that a non-virtual method will be inlined.

2. Unboxing is a cheep operation. The copy operation which c# generates
always after unboxing could be expensive depending on the size of the value
type data. Calling virtual function is like indirect call with two
indirections and if the method doesn't have any parameters (which have to be
copied in the stack) - like simple GetMethod - I don't think it would be
more expensive then unboxing (event if it does have paramters if they are
primitives or reference types the fist two will be transfered via CPU
registeres, the return value as well could be passed back thru
register(s) ).

Of course this guess is as wild as yours ;)


B\rgds
100

[quoted text, click to view]

Frank Hileman
11/21/2003 6:41:26 PM
That's not hard. Use an array of unsigned ints; upper 3 can be type
identifier; remaining bits index into array of a specific type. For example,
wrap it in a struct to make it easier:

enum ArrayType
{
Int = 0x1,
Double = 0x2,...
}

struct ArrayEntry
{
const unsigned int typeMask = 0x7 << 29;
const unsigned int indexMask = ~ typeMask;

usigned int data;

public ArrayEntry(ArrayType type, int index)
{
data= 0; // for compiler warning
Type = type;
Index = index;
}

ArrayType Type
{
get { return (ArrayType)(data>> 29); }
set { value = Index | ((int)data) << 29); }
}

int Index
{
get { return data & indexMask; }
set { data = Type | (value & indexMask); }
}
}

Then if you have ArrayEntry[] a, you do a[i].Index, a[i].Type, then use that
info to index into the type-specific arrays, which I would call pools, if
you choose to use them that way.

Regards,
Frank Hileman
Prodige Software Corporation

check out VG.net: www.prodigesoftware.com/screen5.png
An animated vector graphics system integrated in VS.net
beta requests: vgdotnetbeta at prodigesoftware.com

[quoted text, click to view]

Mountain Bikn' Guy
11/21/2003 10:01:52 PM
Here's the answer to your question (I hope):

The ordering is determined by computational dependencies. Just before
processing starts, an ordering is determined dynamically. Then during
processing, items are written to the array in that order. There is actually
an external index that is used to find items in the array.

Does that answer your question?

I do currently use 'out' parameters in the overloaded methods that retreive
data from the array, as you figured out.

I've written code for other approaches too. I have lots of alternative
implementations that I have tested. As a side note -- which isn't really
relevant to the main topic -- when I need to have overloaded return types, I
use this approach:

<type> GetData(int index, <type> value)
{
return (type)valueList[index];//cast to <type>
}

What this does is use a parameter to chose an overload, and each overload
has a different return type. It is very similar to the 'out' approach except
the data is returned in a more familiar way.

I hope mentioning this doesn't divert this thread from the main topic. My
concern is still more with a design that might solve the problem more
creatively. I have an interesting approach that I tested a couple days ago.
I'll post some example code this weekend.

Thanks for your interest in the topic!
Mountain

[quoted text, click to view]

Frank Hileman
11/22/2003 11:11:43 AM
[quoted text, click to view]

With 5 bits you have 16 different enum values that can be put in (0-15).

[quoted text, click to view]

That would work, but I think the switch would be faster. The one thing I
worry about with your method is possible array type conversion. I don't
think it would happen but the compiler might put some dynamic type checking
in there that is really unnecessary (to make sure the cast is fair). So I
think a switch is most likely faster (small switches optimize down to tight
machine code), since no dynamic cast is needed. Also no array bounds
checking is needed:

switch (typeNdx)
{
ArrayType.Int:
return intArray[itemNdx];
....
}

In C of course, the array indexing would not cause typechecking or array
bounds checking. But even there the switch would probably win because it
does not require an extra memory dereference (every array index operation is
an extra dereference).

[quoted text, click to view]

I would not precompute the sizes at all. Instead, allocate the typespecific
arrays on the fly, in the same way an arraylist works. For example, here is
a types-pecific arraylist sort of collection. It is a bit of code, I will
put it in the next message.

[quoted text, click to view]

Thanks! Structs in C# wrap bitfields beautifully. I think perhaps Brad Adams
and others should not discourage mutable structs so much, because they are
very useful when used as fields in objects. And the compiler can catch the
use of a temporary struct as an lvalue.

I thought of a way to clean up the "magic numbers" a bit (0x7 and 29 in my
sample). In the ArrayType enum:

[Flags]
enum ArrayType : unsigned int
{
Int = 0x1,
Double = Int << 1,
...
All = Int | Double | ...
}

That would give you an unshifted bit mask in the enum, called All. Then, if
you write a function which can count the bits in an unsigned int, you could
compute the constants this way:

static readonly insigned int typeShift = 32 -
BitFunctions.CountBits(ArrayType.All);
static readonly unsigned int typeMask = ArrayType.All << typeShift;
static readonly unsigned int indexMask = ~typeMask;

and in the property:

ArrayType Type
{
get { return (ArrayType)(data >> typeShift); }
set { data = Index | ((unsigned int)value) << typeShift); }
}

Note the change to the set function! I made a mistake before, and swapped
the words "value" and "data". I just wrote the code out without an editor
or compiler.

- Frank



Frank Hileman
11/22/2003 11:28:22 AM
[quoted text, click to view]

Whoops! I meant, with 4 bits you have 16 possible enum values. And the way I
assigned the values was wrong, they are not bitflags:

enum ArrayType : unsigned int
{
Int = 1,
Double = 2,
String = 3,
...
Max= what ever the biggest one is...
}

Then in the bit shifting computation, you would use Max instead, but if it
is not all 1 bits, you have to count the position of the topmost set bit,
and not the number of bits set.

Sorry about that. Should always run these things through a compiler...


Frank Hileman
11/22/2003 11:36:26 AM
Here is the basic code for an arraylist style collection. I took out some of
the ICollection stuff, enumerator, etc. It is called PointList, but really
it is an array of Vector structs (x and y values). So you can see how it can
be used for any data type. Just swap your type name for "Vector".

public class PointList
{
// -------- static members --------
private const int DefaultCapacity = 4;

// -------- instance members --------
private Vector[] items;
private int count;

publicPointList()
{
items = new Vector[DefaultCapacity];
}

public int Count
{
get { return count; }
}

public int Capacity
{
get { return items.Length; }
set
{
if (items.Length == value)
return;
Check.Argument(value >= count, "value",
"Capacity must be greater or equal to Count");
int newCapacity = value > DefaultCapacity ? value : DefaultCapacity;
Vector[] newItems = new Vector[newCapacity];
if (count > 0)
Array.Copy(items, 0, newItems, 0, count);
items = newItems;
}
}

public Vector this[int index]
{
get { return items[index]; }
set
{
if (index >= count)
throw new ArgumentOutOfRangeException("index", index,
"Index must be smaller than Count.");
if (items[index] == value)
return;
items[index] = value;
}
}

public int Add(Vector point)
{
if (items.Length == count)
Capacity = items.Length * 2;
int index = count;
items[index] = point;
++count;
return index;
}

public void AddRange(Vector[] points)
{
int minCapacity = count + points.Length;
if (minCapacity > items.Length)
Capacity = minCapacity > DefaultCapacity ? minCapacity :
DefaultCapacity;
Array.Copy(points, 0, items, count, points.Length);
count += points.Length;
}

public virtual void Clear()
{
Array.Clear(items, 0, count);
count = 0;
}
}

Mountain Bikn' Guy
11/22/2003 3:49:12 PM
Frank,
That's awesome! Beautiful! Just what I was looking for, and the style is
consistent with other code I've written, so it's a beautiful fit! Thanks.

I think I'll use 6 bits for the types (I probably have about 15 different
types I use already). My (upper) array length will not need to be more than
1 million, so this leaves me plenty of room. I might even allocate another
bit for the types.

The obvious way to do the last step you mention, indexing into the
type-specific sub arrays (pools), is to select the type specific array via a
switch or if/else block. Do you have a better suggestion? This step could
end up negating some of the potential performance gains if it isn't done
correctly. Another indexing/dereferencing operation at this step would be
great. Maybe I can put all the type-specific sub arrays in an object[] and
cast from an object to a type[] (ie, double[], bool[], int[] etc.).

object[] poolArray = new object[numberOfTypes]; //array of type specific
arrays

//this statement would exist inside an overloaded "Get" method that requests
a double:
return ((double[])poolArray[typeNdx])[itemNdx];

I have the above code because I was putting type specific arrays of length 1
inside each slot in the upper array as a way to avoid boxing and unboxing.
(In that case the 2nd index was always 0. This is the experimental solution
I previously mentioned that I would post. It's fairly simple to implement,
so I'll just skip a full post.)

Also, I will need to come up with a nice way to properly size the type
specific arrays (pools) before processing starts. My current design should
support this, with a little work. I already know the total size (ie, total
number of items to go into the array) in advance. However, I do not
currently know the size (count) needed on a per-type basis, so I'll have to
figure a way to compute this. This step isn't really as peformance critical
because it only needs to be done once.

Regards,
Mountain

P.S. Your code is a perfect example of why I love C# so much.


[quoted text, click to view]

Richard A. Lowe
11/22/2003 4:47:39 PM
I had to work up my own example since you didn't post any code (Hey, I was
bored today) so maybe my assumptions are really different from yours, but I
found one way of doing it that was 28% faster than the object array in pure
C# and 41% faster with a few easy IL modifications.

Rather than create a specific container *class* create a container *union
struct* with all value types residing in the same space on your struct. Now
I only tried with with one reference type and two value types, but it did
work a lot faster (a plain struct was slower, for some reason creating a
union via the StructLayout attribute was faster).

My 4 tests are downloadable from here:
http://chadich.mysite4now.com/FastestHeterogenousArray.zip

There's a PDF/Word doc that shows a snapshot of the IL I took out of EACH of
the sturct constructors (should be obvious which code is redundant). But
post again if you are unsure how to use ILDasm and ILAsm to de/re-compile
it.

Richard

--
Veuillez m'excuser, mon Français est très pauvre. Cependant, si vous voyez
mauvais C #, c'est mon défaut!
[quoted text, click to view]

Frank Hileman
11/22/2003 5:13:19 PM
[quoted text, click to view]

Yep, sounds like you had a killer implementation. It all depends on the
assembly generated. When you start poking at the assembly level you can see
that every pointer deref has a cost that can be avoided. We are probably
making a lot of assumptions about the generated code from a high level
language. I would be curious to see if there was any speed diff between the
switch and the generic array method. The other thing mentioned, a union,
sounds interesting too, but that requires special permissions at run-time,
because that technique theoretically has security implications. I miss the
union from C. Union would have been my first thought in C++ for this
problem. Seems like they might have put union somehow in C# without this
security problem -- perhaps by forcing the CLR to clear the bits before you
cast it differently.

have fun - Frank

Mountain Bikn' Guy
11/22/2003 7:48:09 PM
Frank,
Thanks for your additional comments. My responses are inline.
Regards,
Mountain

[quoted text, click to view]

I'll take your advice on this. My only experience in this regard is a data
structure I wrote in C++ that used dereferencing extensively and in place of
conditionals whenever possible. It was super fast, out performing the
optimum data structure in this category. (Just so this doesn't sound too
vague, I'll add that it was a binary tree that outperformed an identically
implemented height balanced AVL tree.) This experience has made me inclined
to favor dereferencing whenever possible, but obviously that was a limited
situation.

[quoted text, click to view]

Thanks.
I will use a "type-specific arraylist sort of collection" if I can implement
it such that when I clear my arrays and prepare for new data I don't have to
reallocate anything. That way the cost would be incurred only once. I'm sure
this won't be a problem. All subsequent writes (thousands of them) of new
data (after the first full population and clear) will be identically sized
and identically indexed.

[quoted text, click to view]

I didn't see that one, but I did see this:
"usigned int data;"

That clued me in that you wrote it without an editor/compiler -- I was
impressed. If I don't check my code samples by compiling, I typically end up
with something nonsensical in there somewhere.

Frank Hileman
11/24/2003 9:46:24 AM
If you cannot use strong typing throughout, and avoid casting to object,
then you may as well stick to an ArrayList and boxing. The only interface
that can be used to efficiently retrieve the data is a strongly typed one,
GetInt, GetDouble, etc. Otherwise the whole concept is defeated.

Regarding the overhead from the copy, shift, logical ops, etc. These are all
very fast and inlined. This is the great thing about structs. Time it and
see. Make sure the class you use is not derived from MarshalByRefObject,
which defeats inlining. Based on my own perf tuning experience strongly
typed arrays should beat the boxing alternative hands down.

regards, Frank

[quoted text, click to view]

100
11/24/2003 10:14:20 AM
Hi Frank,
Yes, you can do that. Haw I said "We can provide method that returns the
type of an object by index
value. But then we need to check this type and...". As far as I remember
Mountain was concerned about reading performance. My question now is: what
is your suggestion about the storage-class (as a whole) interface.
So if you have
class MyStorage
{
.....
}

What interface you suggest to retrieve the data. If you have one method
XXXX GetData(int index)
What is the type of XXXX if it is an 'obect' we have to unbox the object
(Montian wanted to avoid exactly this).
If your idea is not to use one class, but insted bunch of different arrays
for each type + one for ArrayElements
We'll have something like this
ArrayElements e = arrElements[index];
switch(e.Type)
{
case Integer:
int i = intArr[e.Index];
/// do sth with int
break;
case Double:
double d = dblArr[e.Index];
/// do sth with double
break;
......

}


Ok, we avoided boxing/unboxing. But how many copy operations we do have?

1 for returning the ArrayElement value + 1 for extracting the index + 1 for
extracting the type + 1 for extracting the concrete value from the
type-cpecific array. At least 4 copy operations + supporting opeations as
shift and logical ops + 2 array indexing(all bound checks, etc). Mountain
finds unboxing(one copy + one typecheck) for expensive. Do you thing it is
more cheap?

B\rgds
100

[quoted text, click to view]

100
11/24/2003 1:19:44 PM
Hi Richard,
Thanx for the post. I was palnning to make the suggestion with "unions ",
but I didn't because I found potential problems with it. I'll try to explain
my issues.
I modified a bit your examples and I got the following results:
* Example using normal obect array - reading 10M items for 0.34s
(29,069,637.13 items/s)
* UnionStruct - reading 10M items for 0.30s (33,025,715.36 items/s).
The others are way worse so I don't want to discuss them.
Way to go "UnionStruct" ;))

IMHO we gain this performance because using unions we copy less data when we
read Value items.

Now my issues.
1. In your case you have value types and one reference type. As you know we
can make an union only with value types. You cannot map reference types to
the same memory with other references or value types.
If we try the UnionStruct example with 2 reference types and one value type
the performance goes as bad as ValueArry exmaple (10M for 0.45s or
22,381,528.52 items/s ).
The ObjectArray with the same types is doing as good as before.

2. Even if you have only value types like big structures "UnionStruct"
example suffer performance hit. ObjectArray is doing better.
I tried your examaples with 1 ref type, 1 double and one structure defined
as
struct MyStruct
{
public int a;
public int b;
public int c;
}
The *double* and the struct share the same memory.

So, the results were UnionStruct example - 0.44s; ObjectArray - 0.4s.

IMHO "unions" could be good in some special casses, but they are not easy
for maintenance (hard to be extended, etc). So, I give my preferences to the
ObjectArrays

B\rgds
100


[quoted text, click to view]

100
11/25/2003 3:24:37 PM
Hi Frank,

[quoted text, click to view]
I didn't get that. Sorry ;(

Do you want to say that if we use strong typing we cannot avoid casting?


[quoted text, click to view]

That is the point. you cannot know, which method overload to use without
checking the type of the item at given index.
I haven't tested it, but I might do it when I have more time. At the moment
I'm very sceptic that reading data from two arrays (one array for items's
info, which involves copying data - the expensive part of unboxing and one
for the actual data, which copy data anyway) will be faster than reading
reference type of reference array + type checking+unboxing. These doubts of
mine are only regarding the reading operation; storing value-type data in
reference array will be much slower.
Of course using vaue types and arrays (not dynamic collections) as a storage
gives good opportunity to c# compiler and JITter for optimizations. So it
might be faster.

[quoted text, click to view]

Yes, bringing shift and logical ops was overkill. BTW they are not inlined
because there is nothing to inline. There are no methods behind them.
About the copy operation as operation which doesn't have to be considered as
a factor when it comes for performance.... hmmm. Again I'm having doubts. I
have timed reading 10,000,000 structures from an array (structs and arrays
good for optimization) and the size of data has noticeable impact on the
results.

[quoted text, click to view]

It would be really unfortunate if it does inlining ;)

B\rgds
100

[quoted text, click to view]

Frank Hileman
11/25/2003 4:51:58 PM
Mr. Mountain,

No, actually, that did not come from Rotor. I extracted it from some common
list code we have been using a very long time now, (over a year and a half)
so it is stable. Not much to it really, except the Array calls. I think if
you review it carefully you can see how it is straightforward logic.

Search and replace is unfortunately the way to go. We have a ListBase that
encapsulates the Array stuff but the array itself must be strongly typed so
we were never able to reuse much code with a base class. In the future
generics will eliminate this problem...

By the way I created several strongly type collections with search/replace
and it worked fine.

regards, Frank

[quoted text, click to view]

Frank Hileman
11/25/2003 4:54:49 PM
[quoted text, click to view]

Yes, you are damn right there! We got a great speedup once by replacing
reflection with generated, strongly typed code. Of course it has to be
something in a fairly tight loop to make a difference.

Frank Hileman
11/25/2003 5:08:09 PM
hello 100, some inline...
[quoted text, click to view]

Once you cast to object you box anyway. So if there is a cast to object
anywhere in the chain just use an object array.

[quoted text, click to view]

Not sure I understand what you mean here. With a strongly typed array, the
strongly typed value is just copied right out of the array, and through a
function call. Arrays are fast. You are hitting two or more arrays, but it
is contiguous blocks. Boxing allocates memory on the heap and messes up
locality of reference, pressures the GC. Really I did not time it but I hope
someone else will.

[quoted text, click to view]

I meant the struct functions.

Mountain Bikn' Guy
11/25/2003 7:34:19 PM
my comments are inline
Mountain

[quoted text, click to view]

Absolutely!
I use an overloaded GetData method that includes an 'out' parameter for each
data type. The GetData overloads simply call GetInt, GetDouble, etc.

[quoted text, click to view]

Agreed. I'm doing timing tests at each step. I've often been surprised to
get results I didn't expect, which, in this case, is what led to my initial
post on this topic.

[quoted text, click to view]

Good tip. I wasn't aware of this.

[quoted text, click to view]

Frank, you've given me some great tips. I'm working on the implementation
now. As a side effect of using strongly typed arrays, at steps further down
the line I will be able to eliminate the use of reflection, which should
give even further performance gains.

Regards,
Mountain

[quoted text, click to view]

Mountain Bikn' Guy
11/25/2003 8:58:50 PM
Frank,
I assume you've seen the Rotor source and maybe used it as a "go-by" for
your arraylist-style collection. I'm thinking about just doing a
search/replace of "object" with my type (double, etc.) to create all my
strongly typed expandable collections. I wanted to ask if you know of any
issues or problems in the Rotor ArrayList source code that