Groups | Blog | Home
all groups > sql server clustering > april 2006 >

sql server clustering : SQL 2000 Cluster recovery after ethernet switch outage


Graham Morris
4/24/2006 12:00:00 AM
I have a two-node SQL Server 2000 cluster. Each node has two network
interfaces - one interface on each is connected to its opposite number by a
crossover cable, and the other interface is connected to an gigabit ethernet
switch, to talk to the outside world. All standard stuff. (Both nodes are
connected to the same switch).

The problem we have is this: if the gigabyte switch is powered down, the
cluster does not recover when it is powered up again. The crossover cable
is never disconnected and the power to each node is not interrupted. The
only way to get the cluster up and running again is to reboot the nodes.

I'm hoping that I can get the cluster to recover by correct configuration
via cluster administrator - but how? Any help greatly appreciated.

Geoff N. Hiten
4/24/2006 8:52:47 AM
You will need to start the clustered resources manually.

What happens is when the switch is off., the NIC shows as unplugged to the
OS, which takes the associated IP addresses offline causing a resource
failure. If you do not power up the switch within a certain time period,
the retry count for the virtual server is exhausted and the cluster "gives
up" on restarting the SQL Server. When the resource is restored, you have
to restart the cluster group and the SQL server group. You should be able
to connect the cluster admin tool to the local node name (not the cluster
name, it is still offline) and restart the cluster service and the SQL
Service without rebooting the nodes.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP



[quoted text, click to view]

Graham Morris
4/24/2006 2:24:22 PM
So it seems that if the switch is out for less than the restart periods of
the IP addresses, the next restart will work and we'll be running. On the
other hand using a long timeout will delay a cluster changeover if the
network interface on one node (rather than the switch) goes down.

Many thanks for the info.

---
Graham

[quoted text, click to view]

AddThis Social Bookmark Button