Groups | Blog | Home
all groups > dotnet performance > march 2005 >

dotnet performance : String manipulation in VB.NET


Sylvain Fortin
3/23/2005 6:05:02 PM
Hi everyone,

My application is written in VB.NET and will run on a SmartPhone. My problem
is that my application is making too many string copy (String.Substring(...))
which is slowing my application.

Example of what the application has to do:

oSize = oGraph.MeasureString(strValue.Substring(5, 10), oFont)

Example of what could be done easily in c/c++ (very ugly but works):

cValue = strValue[15];

strValue[15] = '\0'
oSize = oGraph.MeasureString(&strValue[5], oFont)
strValue[15] = oChar


The first example made a copy when String.Substring(...) was called, which
used memory that will have to be freed later by the garbage collector. The
second example is ugly but I wanted to give a good example of what I ment.
I'm sure you got my point, I haven't used any additionnal memory...

I need a way to do this kind of job in VB.NET cause I can't use the amount
of memory my application need at the moment and the CPU time is completely
lost by these String.Substring(...) operation. Also, I need to be able to
call any .NET function with the string.

Any help will be greatly appreciated!

Regards,

Bob Grommes
3/23/2005 9:21:04 PM
There is a built-in indexer for any string.

Your MeasureString method could use the indexer to examine the relevant
characters without creating new strings in the process, or having to pass in
a substring of the characters to examine:

for (int i = start;i <= end;i++) {
// evaluate the character strValue[i]
}

Just pass the start and end points and the string and let MeasureString do
the work.

Unfortunately the indexer is read-only, so you will still have the overhead
of a String.Replace() call on strValue when you want to put the size byte
back in. This will result in a completely new string being created to
replace the original one.

Two possible solutions:

(1) Instead of a string, use a StringBuilder to build up the string, using
StringBuilder.Append(string) and/or StringBuilder.Append(char) and creating
the string all at once with StringBuilder.ToString().

(2) Instead of a string, use a character array, converting it to a string
all at once when you've finished manipulating it. Or possibly, a byte
array.

The basic idea is to avoid explicitly or implictly creating lots of strings.
Strings in C# are immutable so any change at all to a string destroys the
old one and creates a new one.

I'm assuming of course that you've accurately diagnosed your performance
bottleneck. While it's true that using StringBuilders and other techniques
can be much faster than simple string manipulation, it's only a practical
problem in tight loops, especially on fast hardware.

--Bob

[quoted text, click to view]

Fred Hirschfeld
3/24/2005 6:59:34 AM
Seems like you are using parts of the string in various places? Why not
create a class that can parse the string into its parts (those that are
needed) so that you only use substring once.

Is the string size really large? Does the string have multiple areas that
you are looping though to get this data?

Fred

[quoted text, click to view]

Sylvain Fortin
3/24/2005 8:21:09 AM
Bob,

Can you give me more detail about the built-in indexer you are talking
about, are you referring to String.Chars()?

Thanks for your reply

Regards,

Sylvain Fortin
[quoted text, click to view]
Sylvain Fortin
3/24/2005 8:27:07 AM
Fred,

My application needs to find strings inside large strings to replace them by
images.

For example I need to replace "<-->" with the image cat.jpg.

I have to parse this string: This is a <--> picture...

On the screen the <--> will be replaced by the actual cat picture (cat.jpg)

Thanks for your reply,

Sylvain Fortin




[quoted text, click to view]
Sylvain Fortin
3/24/2005 10:11:08 AM
Bob,

Could you give me an example of what you are talking about in C#. From what
I know, by using indexers I will only be able to retrieve one character at a
time when I need to pass strings as parameter to regular functions.

Thanks for your reply,

Bob Grommes
3/24/2005 10:15:13 AM
I'm sorry, I was thinking in terms of C#. In VB, yes, you would use
strValue.Chars(i).

--Bob

[quoted text, click to view]

Fred Hirschfeld
3/24/2005 10:54:43 AM
Are you using some tag besides the <--> or is that generic and you need to
replace instances of these differently? You might want to look at using
regular expressions to find all the matches and use a single replace. This
works best when you are using distinct tags for data like
<-IMG-> so that you can match exactly by a name.

Fred

[quoted text, click to view]

Bob Grommes
3/24/2005 3:14:13 PM
My point was that rather than passing a substring, pass the whole string
plus the start and end points you want to scan and let the routine look at
each character (I am assuming that's what you need to do anyway). This way
you're not creating a string to pass to the routine, you just keep passing
the same string each time.

--Bob

[quoted text, click to view]

Sylvain Fortin
3/25/2005 2:31:02 AM
Fred,

I don't want to replace any occurrence of a string with another string. The
best example of what I need to do is a Web browser. In the HTML code you have
strings and images to display on the screen and you need to make everything
fit in the rectangle you have (word wrap) and be able to scroll in your
document.

To accomplish that, there are many string operations to do. Every single
character need to be looked at cause it could be an image pattern (which can
be any string pattern you can imagine). My performance bottleneck at the
moment is extracting a part of the string (String.Substring(...)) and using
Graphic.MeasureString(...) repeatedly to see if the string still fit the
rectangle.

Something like this:

for nIndex as integer = 0 to strValue.Length - 1
...
szSize = Graphic.MeasureString(strValue.Substring(0, nIndex), oFont)
...
next

Thanks,

Sylvain Fortin
3/25/2005 2:57:02 AM
Bob,

Just to make sure I understand what you're saying. Lets take the following
line of code:

if string.compare(strValue.substring(0,4), "<()>") = 0 then
...
end if

You're telling me to replace it with:

if string.compare(strValue, 0, "<()>", 0, 4) = 0 then
...
end if

Which is exactly what I'm doing when I can. But when I need to measure the
string width, I don't have any other choice than using string.substring(...)
when I call Graphic.MeasureString(...) cause there's no other way with that
particular method. I'm looking for any technic to call a method in VB.NET
without making a copy of my string.

I gave another example of my application on this thread (reply to Fred
Hirschfeld on 3/25/2005), if you want to take a look at it, you will see a
more detailed explanation comparing my application to a Web browser.

Thanks for your time,

Sylvain Fortin
Fred Hirschfeld
3/25/2005 7:45:07 AM
Ahhh...

Well, that is going to be an issue that you likely want to solve by taking a
guess at when number of characters will fit in your space allocated.

I would recommend you pick a character like O to measure the pixel length of
and then figure out based on your panel width how many O's you could fit in
there to give you the number of characters to measure. This will eliminate
your need to measure so many strings but will give slightly less accurate
results. You would also have to move forward or back on the last character
to get to a space or something so you break on words.

You will need to experiment with it a little as O may be too large as an
average. This will allow you to step over x number of characters instead of
everyone.

ALTERNATIVELY, you could do a measure string on every character 255 to
precache that statically and then instead of using substring you could just
iterate over all the characters in the string and accumulate the size as
appropriate. This may be more of what you are looking for.

Let me know if this helps.

Fred

[quoted text, click to view]

Sylvain Fortin
3/28/2005 11:01:02 AM
Hi Fred,

I did almost exactly what you said in your previous post. I need absolute
accuracy, so what I did is evaluating with a predefined large character
width, when I reach my maximum width, I start using MeasureString() to be
perfectly accurate.

So if we take a 25 char string, I will call MeasureString like 2 times
instead of 25 times. So basically, I now only have 2 to 5 string copies per
lines instead of 75 to 100.

I made many other modifications to remove every possible string copies other
than the ones made when calling MeasureString() and I'm also using a
StringBuilder has my working string. My application is running very smoothly
and performance is at the rendez-vous!!!

Thanks for your time!

Fred Hirschfeld
3/28/2005 4:43:15 PM
Glad to see it helped.

Fred

[quoted text, click to view]

AddThis Social Bookmark Button