dotnet clr:
I'm trying to run multiple independent processes in separate AppDomains.
Ideally, these processes should be restartable after failure (which
hopefully occurs very rarely). I had assumed that, since AppDomains are
supposed to provide isolation, an unhandled exception in an AppDomain would
cause an AppDomain unload and nothing more. Unfortunately, in the default
host, the entire process is terminated instead.
Hosting the CLR gives you the ability to effectively turn off process
termination on unhandled exceptions, using
ICLRPolicyManager::SetUnhandledExceptionPolicy, and even escalate AppDomain
unloads that fail. Unfortunately (this will become a recurring theme, I'm
afraid) there is not actually a way for the process hosting the CLR to
register for a notification that an unhandled exception has occurred.
You can, however, register a managed AppDomainManager type with
ICLRControl::SetAppDomainManagerType. A instance of this type will be
created for every new AppDomain, and it can greatly customize initialization
of the AppDomain. In particular, there will be an AppDomainManager for the
default AppDomain, and you can use this to register an unhandled exception
event handler. Because unhandled exceptions in other AppDomains are also
reported in the default AppDomain, it seems this is the perfect way to
detect unhandled exceptions. Unfortunately, poorly written code in other
AppDomains can prevent this from ever happening.
I've seen the following scenario. When an unhandled exception occurs in an
AppDomain, the (managed) thread that caused the exception actually raises
the UnhandledException event on the AppDomain containing the code that
raised the exception. When and only when the event handler in that AppDomain
terminates will the handler in the default AppDomain be called -- by the
same thread, migrating between domains.
In other words: if a handler for the UnhandledException event ever blocks,
the application keeps running and the unhandled exception is effectively
swallowed. Obviously, it's not acceptable for a reliable host to be foiled
this way. Nor is it acceptable to make it impossible to register
UnhandledException event handlers in the failing domain (which could be
achieved by denying the ControlAppDomain security permission), because the
code should still have the ability to try and clean up what it can. It just
shouldn't be allowed to tie up unhandled exception signaling to the default
AppDomain indefinitely, because this effectively prevents the host from ever
unloading the failing AppDomain as well.
Anyone have some ideas as to how to fix this, or suggestions for a better
approach? (A colleague suggested abandoning AppDomains altogether and just
spawning new processes deftly controlled and monitored by IPC, but I don't
want to go that way unless I absolutely have to, since resource use of all
these processes is a major problem right now.)