all groups > sql server clustering > november 2004 >
You're in the

sql server clustering

group:

Cluster does not restart SQL group after network failure


Cluster does not restart SQL group after network failure Dan
11/22/2004 3:57:04 PM
sql server clustering:
All,

We have just rebuilt a SQL 7.0/NT cluster with Windows 2003/SQL2000 in an
active/passive configuration using 2 nodes. During the course of testing it
we had a general network failure in which the network was unavailable. The
virtual SQL and Windows IP address resources went down and did not come up
automatically once the network was available again. The nodes are configured
for automatic failback.
I can't imagine that in the 2 1/2 years the original cluster was running
that we never once had the network go down, but I do know that during that
time I never had a outage where I had to manually move the cluster group
(which causes the cluster to re-initialize both resources and brings
everything back to normal).

I'm thinking that maybe I'm missing a dependency somewhere or something's
changed between NT and 2003 that I'm not accounting for. Anyone seen this or
have any tips? Thanks in advance!

Re: Cluster does not restart SQL group after network failure Geoff N. Hiten
11/22/2004 9:40:13 PM
Nope, that is pretty much expected behavior. The cluster manager will try
and restart the resources on each possible node until the retry count is
exhausted. Unfortunately, until the network resource is restored, no node
has the ability to run the SQL group. With the physical network port
offline, the IP address(es) will not come online. Nothing dependant on them
will come online, including the Network Name and the SQL Server. If the
network comes back before the retry timeout and count is exhausted, the
cluster will bring the system online. Otherwise it stays down.

--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com

I support the Professional Association for SQL Server
www.sqlpass.org

[quoted text, click to view]

Re: Cluster does not restart SQL group after network failure Dan
11/23/2004 7:41:14 AM
Geoff,

Thanks for the post! I guess I'll just have to make sure the retry &
timeout are set high.

[quoted text, click to view]
Re: Cluster does not restart SQL group after network failure Geoff N. Hiten
11/23/2004 10:51:47 AM
Be careful adjusting those numbers. TOo high can cause just as many
problems as too low. Given the frequency of the network outage and the fact
that something like that will NEVER go unnoticed, I would not change
anything. The cluster failover is designed to reduce the typical 30-45
minute human reponse time for a down server. You shouldn't expect the
clustering software do deal with anything beyond that scope. Adjusting the
parameters to try and expand that coverage will only expose a gap somewhere
else. Just document a cluster check as part of your network failure
recovery procedure and you will be fine.

--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com

I support the Professional Association for SQL Server
www.sqlpass.org

[quoted text, click to view]

AddThis Social Bookmark Button