Groups | Blog | Home
all groups > dotnet clr > june 2005 >

dotnet clr : Threading scenario - best approach ?


Andreas Håkansson
6/16/2005 12:00:00 AM
Jon,

How would the timeout be implemented using a Monitor ?

[quoted text, click to view]

True. Note, however, that there is an alternative to using
Auto/ManualResetEvents - you can use Monitor.Wait and Monitor.Notify.
Personally, I prefer these - they feel more idiomatic .NET somehow,
rather than being Win32 shims. (They also perform very slightly better
if I remember rightly, but the difference isn't significant.)

You could make each worker thread decrement a counter (which is set by
the main thread) and when the last worker thread decrements it to 0, it
could notify the monitor.

[quoted text, click to view]

See http://www.pobox.com/~skeet/csharp/threads/shutdown.shtml for
general guidance about stopping tasks in a controlled way.

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jon Skeet [C# MVP]
6/16/2005 12:00:00 AM
[quoted text, click to view]

Using the call to Wait which takes a timeout.

--=20
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
Jon Skeet [C# MVP]
6/16/2005 12:00:00 AM
[quoted text, click to view]

I suspect I'm biased because of my history here. Coming from a Java
background, I'm very familiar and happy with Monitor.Wait/Pulse etc,
but not so happy with *ResetEvents. I know of other developers who've
come from a Win32 background and feel exactly the opposite.

Either will work perfectly well, of course :)

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
Stefan Simek
6/16/2005 12:00:00 AM
Hi,

I would recommend using your first scenario, as it is simple and
straightforward.
Using the threadpool would introduce additional complications due to the
maximum limit on threadpool thread count, and would not really improve
performance as we're talking about 5-15 second intervals.
The async delegates are essentially only a wrapper around threadpool, so
it's the same as above.

HTH,
Stefan

[quoted text, click to view]
Netveloper
6/16/2005 12:00:00 AM
Stefan,

Thank you for your thoughts. I'm also leaning towards scenario #1 and have
started writing a small prototyp. How would you suggest I wait for the
workers
to finish before the Fetch method return? I could perhaps do as described
below
by calling Join on all worker threads or perhaps pass an
Auto/ManualResetEvent
to each worker and have them singnal completion and in the Fetch method I'd
call WaitHandler.WaitAll




"Stefan Simek" <nospam@nospam.nospam> skrev i meddelandet
news:OovxpulcFHA.1504@TK2MSFTNGP15.phx.gbl...
[quoted text, click to view]

Stefan Simek
6/16/2005 12:00:00 AM
Hi,

I think you can try both, but I guess the Join method will be OK, no
need to introduce another synchronization mechanism. Calling Join() on a
thread that has already finished will return immediately, so the foreach
.... Join will do exactly what is expected - finish after all the threads
are done.

But I'm not trying to push you into anything - use the approach you are
most comfortable with.

Stefan

[quoted text, click to view]
Netveloper
6/16/2005 12:00:00 AM
Hi,

No pressure felt :) I've gotten both to work, equally well and I would just
like to
understand the difference in approach. I guess there are
advantages/disadvantages
with using either of the approaches. Don't really like to use code without
understanding
exaclty what it is doing ;)

"Stefan Simek" <nospam@nospam.nospam> skrev i meddelandet
news:uMWeXXmcFHA.2420@TK2MSFTNGP15.phx.gbl...
[quoted text, click to view]
Netveloper
6/16/2005 10:30:29 AM
Hi,

In one of my classes I have a method, lets call it Fetch, which will collect
data
from various sources and return the combined result. Each of the sources
can
take between 5-15 seconds to collect so I would like to incrcease the
performance by introducing multi-threading support for the actuall
collecting of
data. So the Fetch method should spawn of the works and block until all of
the
workers has finished (or failed).

I have done some lite reading and would like some feedback on what the best
approch would be to implement this scenario.

SCENARIO #1 - Using Thread

I thought about creating a worker thread which will collect information from
a
source and return the result. This worker thread would have to be able to
take
two parameters (used to determine what data to get) and return an array of
objects.

The Fetch method could create a new worker object for each datasource, pass
the correct (two) parameters for it, inform it about which callback to use
to signal
it's completion and pass back the return data to and then start it in a new
thread.

Once all of the worker threads is running the Fetch method would enter
something
like this (note VB.NET as example, could just as well be C# since I code in
both)

For Each WorkerThread In Workers
WorkerThread.Join()
Next

Return CombinedResult

Thoughts and/or suggestions? Advantages/Disadvantages?


SCENARIO #2 - ThreadPool

Just like SCENARIO #1 but I would use the ThreadPool instead. How would I
wait for all the threads to finish before returning, i.e blocking the Fetch
method until
all workes has finished (or failed) ?

Thoughts and/or suggestions? Advantages/Disadvantages?


SCENARIO #3 - Async Delegates

I create a delegate which takes my worker process as a parameter. The
delegate is then
called using the async method, BeginInvoke and use an AsyncCallback to
gather and
combin the worker results. I would probably built this using the technique
posted by Mike
Woodring, of DevelopMentor, on the Advanced-Dotnet mailing list

(watch for line-wrapping)
http://discuss.develop.com/archives/wa.exe?A2=ind0302B&L=ADVANCED-DOTNET&D=0&I=-3&P=2534

to ensure EndInvoke was called, this avoiding a possible memory leak. If I
went down this
road, how would I make the Fetch method block until all of the async
operations had finished
(or failed) without having to resort to a busy-wait ?


All thoughts and suggestions will be apprechiated on this subject.
Thanks!


john conwell
6/16/2005 3:39:02 PM
First, are the Fetch methods getting data on a different server? or the
server the app is running on? does the server that the threads are running
under have multiple procs or just one. if its just one, then you are more
likely to slow your app down then speed it up. the same amount of processing
has to get done, but now you are tossing on thread mgt and context switching
into the mix. only do this if your app is distributed or the server is
multi-proc.

As far as which way to go, I'm going to have to disagree. I'd go with
solution 3, async delegates. First async delegates have a simple way to wait
for all threads to finish. just collect all the returned
IAsyncResult.AsyncWaitHandles into an array and call WaitHandle.WaitAll,
passing in the array. This will pause the main thread until all delegates
are finished running. doesnt get much easier.

Also, as far as performance goes, the perf cost of initializing 5 - 10 new
manual threads is much more than utilizing the pre-existing threads already
initialized in the thread pool. As far as a threadpool max count is
concerned, this shouldnt be an issue either. If you call
ThreadPool.GetMaxThreads you'll see how many threads can be created in the
pool. On my system its 100 (not sure if this is different per OS version or
not). And if your plan on running more than 100 async tasks you should
rethink this also, as this would probably bog down the CPU with all the
processing and context switching. The threadpool can manage multiple threads
quite well, and by the time you are ready to kick off your last thread, the
first thread might be finished. in that case the thread pool will just reuse
an existing thread instead of create another.

Remember creating threads is a fairly significant performance hit.

[quoted text, click to view]
john conwell
6/16/2005 4:11:06 PM
if you go with manually creating your own threads, i'm a bigger fan of using
an AutoResetEvent with a WaitAll() call, rather than Join(). It just seems
more elegant for managing a large colleciton of threads

[quoted text, click to view]
Jon Skeet [C# MVP]
6/16/2005 5:51:19 PM
[quoted text, click to view]

Personally I'd use Join - no need to create any events you don't need,
and it does exactly what it says on the tin.

If you want to use a custom threadpool for this, by the way, you could
use the one I've written:
http://www.pobox.com/~skeet/csharp/miscutil

You could subscribe to the event which is fired after a thread job has
finished to synchronize the main thread. (Of course, you wouldn't be
able to use Thread.Join in that scenario.)

--
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
Jon Skeet [C# MVP]
6/16/2005 9:46:59 PM
[quoted text, click to view]

True. Note, however, that there is an alternative to using=20
Auto/ManualResetEvents - you can use Monitor.Wait and Monitor.Notify.=20
Personally, I prefer these - they feel more idiomatic .NET somehow,=20
rather than being Win32 shims. (They also perform very slightly better=20
if I remember rightly, but the difference isn't significant.)

You could make each worker thread decrement a counter (which is set by=20
the main thread) and when the last worker thread decrements it to 0, it=20
could notify the monitor.
=20
[quoted text, click to view]

See http://www.pobox.com/~skeet/csharp/threads/shutdown.shtml for=20
general guidance about stopping tasks in a controlled way.

--=20
Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet
Andreas Håkansson
6/16/2005 10:34:50 PM
Jon,

Thanks for your feedback. I've been thinking about leveraging a timeout so
that
the collecting of data wont block indefinitely. I saw that both the Join and
WaitAll
methods accepted an optional timeout parameter.

However the functionality provided by them aren't interchangable since using
a
timeout the Join method will make the first thread run a maximum time of x
(the
timeout), the next thread will run 2*x, the next 3*x and so on. With
WaitAll, all
threads will get the same change to execute before the method stops blocking
the execution of the main thread.

The timeout, however, makes me wonder about the left worker threads. They
will continute executing in the background until they are finished. Do I
have to
clean them up myself, if so then how? What about, for example, if of the
worker threads
calls a webservice and for some reason is unable to establish a connection,
leaving it waiting for it's own timeout which could have been increased
beyond
the default time. This would leave the worker threads hanging around for a
long
time even though the main thread timed out and continued executing.. =/



[quoted text, click to view]

Netveloper
6/17/2005 9:01:14 AM
John,

Thanks for your feedback. Well lets see. The system is a multi cpu setup
with ample
amount of memory and a disc system with good throughput. The data sources
are
not located on the same machine, all are on remove web services and
rdbms.When
it comes to using async delegates I really wouldn't base my descision based
on your
arguments (this is not to say async delegates wouldn't be a good solution).

The reasons being that collecting the return data by collecting the wait
handlers and
doing a WaitAll on them is not different from doing the same when manually
spawning
your own threads (with the help of Auto/ManualResetEvent objects), calling
Join on
each method, or like Jon suggested - using a Monitor.

Also if you concider my breif description of the data collection, it will
take between
5-15 (could take longer) seconds, averaging around 10 seconds. Now with this
time
fame in mind, the cost of spawning a new thread and any context switching
that might
take place every now and then, is faily cheep. If you don't concider the
context, then
sure thread creation and context switching are expensive operations.

The default size of the thread pool is 25, and it's defined in the
processModel node of
machine.config. The pool is self is the mest intressting point for using
either scenario 2
or three. There is no denying that using the pool to recycle threads will
boost performance,
how much is hard to tell since we're speaking in relative terms of the
actuall collecting
of data. If I have a need to create x-threads for each call to Fetch and
there are y-calls
to Fecth each second/minute then I might as well funnel them threw the pool.

But.. the thread pool wouldn't be exclusive to my Fetch method, it would be
shared for
my application (which btw is a web-application) and if there are any async
operations
etc elsewhere then it will eat away on the pool - leaving for the
possibility for the worker
threads of the Fetch method to queue up and wait, resulting in a decrease in
performance.
Increasing the size of the thread pool could solve this.

Sorry if I'm not very cohesive here, but I only got a couple of hours of
sleep last night
and I admit that I'm just ranting what ever thoughts spring into my head
while replying to
your post :-)

"john conwell" <johnconwell@discussions.microsoft.com> skrev i meddelandet
news:D0EC33A8-5332-4705-9D99-6BDD1DC95D8D@microsoft.com...
[quoted text, click to view]
john conwell
6/17/2005 1:06:06 PM
Oh, its a web app...That really makes a difference. From my experience you
definitly dont want to use the treadpool then, because you would be stealing
threads from your sites request handler, since it also uses the thread pool
to service new http requests.

I've played around with this a lot and its hard to find a good mix when each
request could kick off multiple threads. Definitly use a web site load test
tool (such as ACP) to prove if you actually sped things up or slowed them
down.

I had a site that for a speific request needed to get 7 result sets of data
(from a web service). I tried many combinations of threading. One thread
per result set, 2 result sets per thread, thread pool, manual threads. in
the end with the site under moderate load, the fastest method was to do it
synchrounously. these web service calls were pretty short, so under your
situation you would get better results since each call takes 10 - 15 seconds.

Another thing to consider is to create a custom IHttpHandler to intercept
all calls to this page and kick off the threads in the ProcessRequest()
method. Then forward the request on the to desired page to be processes.
Then in that page sync back up with the threads using Join(). This way the
threads can get some extra process time in before they have to sync back up.

[quoted text, click to view]
AddThis Social Bookmark Button