Groups | Blog | Home
all groups > sql server clustering > april 2007 >

sql server clustering : TCP connections suddenly failing.


ken.sumner NO[at]SPAM gmail.com
4/16/2007 7:25:00 PM
I am running a large mission critical application on a cluster. This
is a 2003 server latest SP with SQL 2000 Enterprise latest SP also.

This app has run for years with little change. Today at around 3:00
all connections stopped coming in.

After some debugging, we found that we could connect with named
pipes., but not TCP. TCP has been the default protocol for years.

We have the server monitored and no changes occurred recently. The
server has not been rebooted in months, and the last time the instance
was restarted was two months ago.

What can cause a server that has run reliably using TCP for years to
stop accepting TCP connectionsl suddenly?

The errors I am getting is a handshake error trying to connect locally
using TCP, and I get a connection failed error remotely.

Connections are instant and lovely using named pipes.

Thanks for any help.
Kenneth Sumner
4/16/2007 8:12:24 PM
just a FYI. I can telnet to udp and Tcp port.


[quoted text, click to view]
Gabe Matteson
4/17/2007 5:51:44 AM
Can you check
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSSQLServer\Cluster
and make sure the clusteripaddr is set to the virtual server ip?

[quoted text, click to view]

ken.sumner NO[at]SPAM gmail.com
4/17/2007 6:35:13 AM
The clusteipaddr was fine on both instances on both all servers within
the cluster.

I want a root cause, but we took the cluster group offline and brought
it back online and that resolved the TCP connect issue.



On Apr 17, 5:51 am, "Gabe Matteson" <gmatteson.rounder.com.nospam>
[quoted text, click to view]

ken.sumner NO[at]SPAM gmail.com
4/17/2007 7:52:21 AM
[quoted text, click to view]

Another issus that may have had an impact is that just before the
problem started, we had a 2019 Srv error: Server was unable to
allocate from the system nonpaged pool because pool was empty.

Can this have an impact on SQL Server TCP connections only, but allow
named pipes to connect?
ken.sumner NO[at]SPAM gmail.com
4/17/2007 10:13:56 AM
We have a medium to high amount of .net connections Please explain
further... I am interested.

[quoted text, click to view]

Geoff N. Hiten
4/17/2007 11:17:22 AM
Yes. IIRC, that is where systmem network buffers come from. By chance, do
you have a lot of .Net client application connections?

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP




[quoted text, click to view]
Geoff N. Hiten
4/17/2007 1:50:54 PM
..Net uses an 8k packet size by default. This is larger than the SQL 2000
default packet size and requests for allocations must be serviced from the
system nonpaged memory pool rather than the internal network packet pool.
The high use from this pool can cause severe fragmentation and memory
starvation. You may want to lower the amount of physical memory allocated
to SQL Server in this situation.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP




[quoted text, click to view]
ken.sumner NO[at]SPAM gmail.com
4/17/2007 4:05:41 PM
Thanks for the info. Doing the math, this could be our issue. But
how could that cause TPC only to stop working but allow named pipes
to work without issue?

[quoted text, click to view]
ken.sumner NO[at]SPAM gmail.com
4/17/2007 4:21:53 PM
I wasn't clear in the last post. The memory gets fragmented and the
OS is starved. There are issues and the noted error messages.
Reducing SQL's memory footprint, and allowing the OS more room, should
resolve the problem.

Now the question goes to why it quits accepting TCP connections? Are
you saying that the non-paged pool doesn't recover-- when the load
drops, and you must reinitialize this pool after this issue happens?

[quoted text, click to view]

Kenneth Sumner
4/17/2007 10:53:06 PM
That or something similar appears to be the case.

This error with non-paged memory happened at 3:00 yesterday afternoon.
There was heavy traffic, but there has been worse without error.

After this error no TCP connections could be made to the SQL server.
Pings to the server were fine. I could Telnet to the SQL instance
fine via both UDP port and to TCP port.

Connections were great with Named Pipes. The connection error
everyone got when they tried to connect to the server was either an
invalid connnect string error or a handshake error, like TCP was
suddenly Greek to the SQL instance.

We brought the clustergroup offline and then back online, TCP worked
problem resolved. The group was never moved nor did it fail over.



[quoted text, click to view]
Geoff N. Hiten
4/18/2007 12:00:00 AM
TCP packet allocation is different than named pipes and is not as sensitive
to nonpaged pool fragmentation. Wyy something breaks today and not
yestereday or tomorrow on an apparantly stable system is a question that
keeps many DBAs gainfully employed. I wish I had the easy answer to that
one.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP




[quoted text, click to view]
AddThis Social Bookmark Button