all groups > dotnet clr > august 2006 >
You're in the

dotnet clr

group:

Any help understanding windbg/sos dump of hung app


Any help understanding windbg/sos dump of hung app John Aldrin
8/10/2006 2:41:02 PM
dotnet clr:
Hi,

Wondering if anyone can make better sense of this app dump than I.

Working on app that perioldiclly hangs, with maxed out cpu. The app is
reading data from hearing aids ( 1 or 2) connected via a serial port to an
external device that interfaces w/hearing aid. App is a win forms app
written in C#, Dot Net 1.1.

Using Windbg I determined that the apps main ui thread is using all the cpu.
used adplus to capture app state.

Best I can tell is that our main ui thread is in the garbage collecting
state and it is acting like it is in a infinite loop.

** Thread Info **

ThreadCount: 6
UnstartedThread: 0
BackgroundThread: 2
PendingThread: 0
DeadThread: 2
PreEmptive GC Alloc
Lock
ID ThreadOBJ State GC Context Domain
Count APT Exception
0 0x1194 0x0015edd0 0x6020 Disabled 0x00000000:0x00000000 0x00159dd0
1 STA (GC)
2 0x1388 0x001676f0 0xb220 Enabled 0x00000000:0x00000000 0x00159dd0
0 MTA (Finalizer)
XXX 0 0x00223918 0x5820 Enabled 0x00000000:0x00000000 0x00159dd0
0 STA
XXX 0 0x04e00f80 0x1820 Enabled 0x00000000:0x00000000 0x00159dd0
0 Ukn
11 0x1400 0x04e02c80 0x7022 Enabled 0x00000000:0x00000000 0x00159dd0
0 STA
12 0x1060 0x04e02e38 0x2001220 Enabled 0x00000000:0x00000000 0x00159dd0
0 Ukn


0 - is main ui thread
11 - is background thread to do i/o w/external device
12- is background thread that toggles the priority of the i/o background
thread(11).


A little confused about thread state values, seems to be more data than
ThreadState enum supports,
My quess is
0 is WaitSleepJoin
11 is WaitSleepJoin and SuspendRequested
12 is WaitSleepJoin

States for 11 and 12 seem funny to me because their call stack seems to show
they waiting for gargage collection 2 finish. ( following call
WaitUntilGCComplete in call stack)

Looking at call stack for main ui thread it appears that it had to allocate
more memory and
a garbage collection was started and it never finished.


dump stacks for threads follow

Thread 0 - sorry quite big

0:000> !dumpstack
Thread 0
Current frame: ntdll!KiFastSystemCallRet
ChildEBP RetAddr Caller,Callee
0012e5a8 7c90e9ab ntdll!ZwWaitForMultipleObjects+0xc
0012e5ac 7c8094f2 kernel32!WaitForMultipleObjectsEx+0x12c, calling
ntdll!ZwWaitForMultipleObjects
0012e638 7c8399f3 kernel32!_except_handler3
0012e648 7c809c86 kernel32!WaitForMultipleObjects+0x18, calling
kernel32!WaitForMultipleObjectsEx
0012e664 79247d67 mscorwks!Thread::SysSuspendForGC+0x248, calling
kernel32!WaitForMultipleObjects
0012e6bc 791efdaf mscorwks!Int32ToDecStr+0xf4, calling
mscorwks!__security_check_cookie
0012e6dc 791b452d mscorwks!gc_heap::allocate_more_space+0x3f0, calling
mscorwks!gc_heap::adjust_limit_clr
0012e71c 791b452d mscorwks!gc_heap::allocate_more_space+0x3f0, calling
mscorwks!gc_heap::adjust_limit_clr
0012e758 792d176b mscorwks!GCHeap::Alloc+0x77, calling
mscorwks!gc_heap::allocate_more_space
0012e764 792d17bd mscorwks!GCHeap::Alloc+0xc9, calling
mscorwks!__security_check_cookie
0012e774 791db30c mscorwks!FastAllocatePrimitiveArray+0x45, calling
mscorwks!Alloc
0012e794 791db8a5 mscorwks!JIT_NewArr1+0xbb, calling
mscorwks!FastAllocatePrimitiveArray
0012e79c 791db8b2 mscorwks!JIT_NewArr1+0xc8, calling mscorwks!Frame::Pop
0012e7a0 791db8ba mscorwks!JIT_NewArr1+0xd0, calling
mscorwks!HelperMethodFrame::RestoreState
0012e7b4 792d17bd mscorwks!GCHeap::Alloc+0xc9, calling
mscorwks!__security_check_cookie
0012e7c4 793c22d0 mscorwks!COMNlsInfo::GetCaseInsHash+0xcc, calling
mscorwks!HashiStringKnownLower80
0012e7c8 791f175b mscorwks!JITutil_IsInstanceOfBizarre+0x15d, calling 00971e90
0012e7dc 791db847 mscorwks!JIT_NewArr1+0x15, calling
mscorwks!LazyMachState::captureState
0012e7f0 7923135c mscorwks!COMString::EqualsObject+0x45, calling
mscorwks!WcharCompareHelper
0012e7f8 79231365 mscorwks!COMString::EqualsObject+0x4e, calling 00971e90
0012e804 799a44fd (MethodDesc 0x79ba9b70 +0x35
System.Collections.Hashtable.KeyEquals)
0012e80c 7999c0e8 (MethodDesc 0x79ba9ad0 +0xc8
System.Collections.Hashtable.get_Item)
0012e840 791f11dc mscorwks!COMString::EqualsString+0x3f, calling
mscorwks!WcharCompareHelper
0012e854 791b452d mscorwks!gc_heap::allocate_more_space+0x3f0, calling
mscorwks!gc_heap::adjust_limit_clr
0012e870 791b3271 mscorwks!BaseCrst::Enter+0x3e, calling
ntdll!RtlTryEnterCriticalSection
0012e880 791b3284 mscorwks!BaseCrst::IncThreadLockCount+0x6, calling 00971e90
0012e89c 792116aa mscorwks!GCHeap::SuspendEE+0xcf, calling
mscorwks!Thread::SysSuspendForGC
0012e8b4 79213c1f mscorwks!GCHeap::GarbageCollectGeneration+0x103, calling
mscorwks!GCHeap::SuspendEE
0012e8d0 79214e83 mscorwks!gc_heap::allocate_more_space+0x13a, calling
mscorwks!GCHeap::GarbageCollectGeneration
0012e900 792d176b mscorwks!GCHeap::Alloc+0x77, calling
mscorwks!gc_heap::allocate_more_space
0012e91c 792d17bd mscorwks!GCHeap::Alloc+0xc9, calling
mscorwks!__security_check_cookie
0012e93c 792d17bd mscorwks!GCHeap::Alloc+0xc9, calling
mscorwks!__security_check_cookie
0012e960 0f97366c (MethodDesc 0xf7f8600 +0xcc4
System.Data.XmlDataLoader.InferSchema), calling mscorwks!JIT_IsInstanceOf
0012e97c 791b3af0 mscorwks!Alloc+0x3a, calling mscorwks!GCHeap::Alloc
0012e98c 791db30c mscorwks!FastAllocatePrimitiveArray+0x45, calling
mscorwks!Alloc
0012e9ac 791db8a5 mscorwks!JIT_NewArr1+0xbb, calling
mscorwks!FastAllocatePrimitiveArray
0012e9b4 791db8b2 mscorwks!JIT_NewArr1+0xc8, calling mscorwks!Frame::Pop
0012e9b8 791db8ba mscorwks!JIT_NewArr1+0xd0, calling
mscorwks!HelperMethodFrame::RestoreState
0012e9e0 791f175b mscorwks!JITutil_IsInstanceOfBizarre+0x15d, calling 00971e90
0012e9f4 791db847 mscorwks!JIT_NewArr1+0x15, calling
mscorwks!LazyMachState::captureState
0012ea08 7923135c mscorwks!COMString::EqualsObject+0x45, calling
mscorwks!WcharCompareHelper
0012ea10 79231365 mscorwks!COMString::EqualsObject+0x4e, calling 00971e90
0012ea1c 799a44fd (MethodDesc 0x79ba9b70 +0x35
System.Collections.Hashtable.KeyEquals)
0012ea24 7999c0e8 (MethodDesc 0x79ba9ad0 +0xc8
System.Collections.Hashtable.get_Item)
0012ea58 791f11dc mscorwks!COMString::EqualsString+0x3f, calling
mscorwks!WcharCompareHelper
0012ea6c 0f97366c (MethodDesc 0xf7f8600 +0xcc4
System.Data.XmlDataLoader.InferSchema), calling mscorwks!JIT_IsInstanceOf
0012ea88 791b3af0 mscorwks!Alloc+0x3a, calling mscorwks!GCHeap::Alloc
0012ea98 791db30c mscorwks!FastAllocatePrimitiveArray+0x45, calling
mscorwks!Alloc
0012eab4 791b3af0 mscorwks!Alloc+0x3a, calling mscorwks!GCHeap::Alloc
0012eac4 791bdfaa mscorwks!AllocateArrayEx+0x161, calling mscorwks!Alloc
0012ead4 791b3af0 mscorwks!Alloc+0x3a, calling mscorwks!GCHeap::Alloc
0012eae4 791bdfaa mscorwks!AllocateArrayEx+0x161, calling mscorwks!Alloc
RE: Any help understanding windbg/sos dump of hung app John Aldrin
8/11/2006 8:35:03 AM


[quoted text, click to view]

I verified that the main ui thread was maxing out the cpu using windbg
!runaway 3 cmd.

Following is output showing that thread 0 (main ui thread) is only thread
getting cpu time

0:014> !runaway 3
User Mode Time
Thread Time
0:804 0 days 9:51:35.968
2:d8c 0 days 0:00:00.921
12:9dc 0 days 0:00:00.671
7:bc0 0 days 0:00:00.046
9:990 0 days 0:00:00.031
11:704 0 days 0:00:00.015
10:f70 0 days 0:00:00.015
8:c08 0 days 0:00:00.015
14:e8c 0 days 0:00:00.000
13:308 0 days 0:00:00.000
6:878 0 days 0:00:00.000
5:1f8 0 days 0:00:00.000
4:b28 0 days 0:00:00.000
3:fa0 0 days 0:00:00.000
1:ffc 0 days 0:00:00.000
Kernel Mode Time
Thread Time
0:804 0 days 6:31:42.500
12:9dc 0 days 0:00:06.140
2:d8c 0 days 0:00:00.296
7:bc0 0 days 0:00:00.046
5:1f8 0 days 0:00:00.046
9:990 0 days 0:00:00.031
14:e8c 0 days 0:00:00.000
13:308 0 days 0:00:00.000
11:704 0 days 0:00:00.000
10:f70 0 days 0:00:00.000
8:c08 0 days 0:00:00.000
6:878 0 days 0:00:00.000
4:b28 0 days 0:00:00.000
3:fa0 0 days 0:00:00.000
1:ffc 0 days 0:00:00.000
0:014> g
(2dc.dbc): Break instruction exception - code 80000003 (first chance)
eax=7ffd9000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c901230 esp=02b4ffcc ebp=02b4fff4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
ntdll!DbgBreakPoint:
7c901230 cc int 3
0:014> !runaway 3
User Mode Time
Thread Time
0:804 0 days 9:51:43.000
2:d8c 0 days 0:00:00.921
12:9dc 0 days 0:00:00.671
7:bc0 0 days 0:00:00.046
9:990 0 days 0:00:00.031
11:704 0 days 0:00:00.015
10:f70 0 days 0:00:00.015
8:c08 0 days 0:00:00.015
14:dbc 0 days 0:00:00.000
13:308 0 days 0:00:00.000
6:878 0 days 0:00:00.000
5:1f8 0 days 0:00:00.000
4:b28 0 days 0:00:00.000
3:fa0 0 days 0:00:00.000
1:ffc 0 days 0:00:00.000
Kernel Mode Time
Thread Time
0:804 0 days 6:31:46.781
12:9dc 0 days 0:00:06.140
2:d8c 0 days 0:00:00.296
7:bc0 0 days 0:00:00.046
5:1f8 0 days 0:00:00.046
9:990 0 days 0:00:00.031
14:dbc 0 days 0:00:00.000
13:308 0 days 0:00:00.000
11:704 0 days 0:00:00.000
10:f70 0 days 0:00:00.000
8:c08 0 days 0:00:00.000
6:878 0 days 0:00:00.000
4:b28 0 days 0:00:00.000
3:fa0 0 days 0:00:00.000
1:ffc 0 days 0:00:00.000





[quoted text, click to view]

I'm able to recreate the problem - each time I see the same thing. The main
ui thread is not frozen and I can view disassmbly and step thru code.
Following is some of the method names I see.

KERNEL32!_SEH_prolog:
ntdll!RtlActivateActivationContextUnsafeFast:
KERNEL32!BaseFormatTimeOut:
ntdll!ZwWaitForMultipleObjects:
KERNEL32!BaseSetLastNTError:
ntdll!RtlNtStatusToDosError:

sequence of method calls seem to repeat as I continue to step w/debugger.

Method call sequence leads me to believe that some runtime error is occuring
( _SEH and LastNTError method names ). Anyone know of a register/method
combo I can look at to see what error is occurring.

Also, confused that debugger is not breaking if an error is occuring. Could
there some event filter I need to change?

[quoted text, click to view]
Re: Any help understanding windbg/sos dump of hung app John Aldrin
8/11/2006 8:45:02 AM


[quoted text, click to view]

U are correct. I've written a test app to try to recreate the problem and it
is repeatedly reading data.

[quoted text, click to view]

I can recreate the problem. I used windbg !runaway 3 to verify that main ui
thread is only thread getting cpu.

I can view disassembly and step thru code, following is some of the methods
I see getting called.

KERNEL32!_SEH_prolog:
ntdll!RtlActivateActivationContextUnsafeFast:
KERNEL32!BaseFormatTimeOut:
ntdll!ZwWaitForMultipleObjects:
KERNEL32!BaseSetLastNTError:
ntdll!RtlNtStatusToDosError:

Above is not a comprehensive list, but as I keep steping the method names
seem to repeat.

Method call sequence leads me to believe that some runtime error is occuring
( _SEH and LastNTError method names ). Anyone know of a register/method
combo I can look at to see what error is occurring.

Also, confused that debugger is not breaking if an error is occuring. Could
there some event filter I need to change?

[quoted text, click to view]

I will try to verify.

[quoted text, click to view]
RE: Any help understanding windbg/sos dump of hung app jetan NO[at]SPAM online.microsoft.com (
8/11/2006 9:44:52 AM
Hi Jrax,

Based on your post, it seems that your application is experiencing a high
CPU hang. Can you tell me how do you identify that your main GUI thread is
causing the high CPU? Based on the call stack information, it seems that
the main GUI thread is hanging in kernel32!WaitForMultipleObjectsEx, which
should not cost any CPU time. Can you give it recheck?

The simplest way to identify if a thread is costing high CPU time is using
Process Explorer's "Threads" tab in your application's property dialog. You
may monitor "CPU" column in this tabpage, it will tell you if the main GUI
thread is costing CPU now.

Also, it is a good idea to obtain more than 1 dumps from your application
and view the main thread call stack, then you can identify if the main GUI
thread is really hanging in the mscorwks!Thread::SysSuspendForGC for a long
period.

Finally, if you require urgent help of analyzing the .Net dump file, I
recommend you try to contact Microsoft PSS, they will help you to look into
the dump file and perform code review.

Thanks.

Best regards,
Jeffrey Tan
Microsoft Online Community Support
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
Re: Any help understanding windbg/sos dump of hung app Barry Kelly
8/11/2006 10:52:40 AM
[quoted text, click to view]

Yes, it looks like you caught the code doing a GC. Thing is, if your own
code has a busy loop hang and its allocating loads of objects in that
busy loop, chances are that when broken into, it'll be performing a GC.

[quoted text, click to view]

Are you sure about that - and not that it's just that each time you
break into it, it happens to be performing a GC?

[quoted text, click to view]

Are you certain that Starkey.Core.DataServer.Internal.Request() (i.e.
the constructor) calls DataSet.ReadXml and never returns?

-- Barry

--
Re: Any help understanding windbg/sos dump of hung app Loy
8/13/2006 12:04:07 AM
A shot in the dark:

We had such experience (infinite loop in the message-loop) and we found
out that we were calling UI calls out of the GUI thread. - since the
app was defined as STA the state was unpredictable

If you have large "% time in GC" (perfmon counter)
--> Check the "# of pinned objects" as it might point to fragmanted
heap

(againg - this is a shot in the dark)

Loy


[quoted text, click to view]
RE: Any help understanding windbg/sos dump of hung app jetan NO[at]SPAM online.microsoft.com (
8/14/2006 9:45:24 AM
Hi Jrax,

Thanks for your feedback!

Yes, by using !runaway and based on the result, I agree with you that 0:804
thread is causing the high CPU hang and I assume 0:804 is the main GUI
thread.

Since you confirmed "each time I see the same thing", I assume you mean
that all the call stacks you obtained are hanging in
Thread::SysSuspendForGC and kernel32!WaitForMultipleObjectsEx.

This looks very strange to me. Since high CPU hang is normally caused by
CPU running at a full speed, which means that the high CPU thread is
executing the code as much as possible. Normally, this is caused when the
thread is trapped in a poorly written loop or the thread has a lot of code
to execute. Since the ntdll!ZwWaitForMultipleObjects will interrupt to
kernel mode, I do not think it will use any user-mode time. Based on my
knowledge, the kernel mode code for ZwWaitForMultipleObjects will just
implement the kernel-mode synchronization primitive also will not cost much
CPU time. So it is really strange that this thread is always costing CPU
time.

I highly recommend you to try to use Process Explorer to view this main GUI
thread in "Threads" tabpage. Once you selected your thread in the ListBox,
the below will display all kinds of thread information dynamically, such as
"Kernel Time", "User Time" the same as !runaway. However, since it will
change these 2 values dynamically, it will be easy to see if your thread is
costing any CPU time lively.

I am not sure the relation of your new posted function list with the
original call stack, I can not find them in your original stack list. Based
on my review, these methods are setting Win32 error in TEB for certain
failing Win32 API call. Win32 API error setting will not throw exception
during execution, so the debugger will not break. It is our code's
responsibility to check the Win32 API return value and error code for
failure.

Finally, since you have a little test application to reproduce this issue,
is it possible for you to provide this little sample project to me? I will
take a look on it. Thanks.

Best regards,
Jeffrey Tan
Microsoft Online Community Support
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
RE: Any help understanding windbg/sos dump of hung app jetan NO[at]SPAM online.microsoft.com (
8/16/2006 12:00:00 AM
Hi Jrax,

Have you reviewed my last reply? Does it make sense to you? Please feel
free to feedback, thanks.

Best regards,
Jeffrey Tan
Microsoft Online Community Support
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
AddThis Social Bookmark Button