Groups | Blog | Home
all groups > sql server clustering > march 2006 >

sql server clustering : Passive node starting the SQL Server services - services set to ma


Christopher
3/26/2006 2:51:31 PM
Hello all,

I have a very bizarre situation with a single instance/two node/W2K3
cluster. The issue I am seeing is that the passive node is starting the sql
server services even though the resources are not owned by that node. The
restarts are occuring every five minutes. The system log shows the SYSTEM
account starting MSSQLSERVER. Then, 3 seconds later, I get a message stating
the service stopped. About 5-8 seconds later, I get a message that the
SQLSERVERAGENT could not be started because sqlservr.exe is not running. The
cluster log is not showing anything being started/stopped. The applicaiton
log is complaining about the errorlog being unavailable. However, this is
correct because the resources are not on this node.

The SQL Server services are set to manual on both nodes. Both nodes have
been rebooted in the past 24 hours due to EMC PowerPath upgrades. The only
thing that is a little bizare is that the SQL Server resource group and SQL
Server resource have the preferred owners/possible owners in reverse order.
IE (SQL Server resource group has NODE1/NODE2; SQL Server Resource has
NODE2/NODE1). Right now all resources are on NODE1.

Any ideas would be greatly appreciated.

Thanks in advance
Geoff N. Hiten
3/27/2006 1:34:20 PM
Manual start for services is correct for a cluster. The cluster should
control the services on each node. Look at the applicaiotn logs on both
nodes. Check the failover/failback settings. You may be seeing a failback
condition due to different preferred node settings.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP




[quoted text, click to view]

Christopher
3/27/2006 1:51:02 PM
Geoff,

I'm not sure what you mean in regards to failover/failback option for each
node. Going to properties of each node yields only one tab. As mentioned
below, the preferred owners for the resource group and possible owners for
the resource were inverted:
Preferred owner(s)
NODE1
NODE2

Possible owner(s)
NODE2
NODE1
(someone changed the above. Now, both settings are the same).
Could you point me to the area you are referring to below - preferred node
settings?

The resource group has the "prevent failback" radio button selected.

The application logs on both machines have the same information: the data
drive not being available - the server name is the SQL Server Network Name.

It turns out that there was a problem with backplane of this server which
controls internet connectivity. After this hardware was replaced and the
server rebooted, the problem ceased. I'm not sure if that has anything to do
with the resolution.

Thanks in advance.
-C


[quoted text, click to view]
Christopher
3/27/2006 2:21:02 PM
So, if the node is the passive node and has connectivity issues, would that
cause this problem? Remember, the passive node was having issues.

[quoted text, click to view]
Christopher
3/27/2006 2:39:16 PM
You are very good my friend. Yes, this is an HP BL40p blade cluster. It
causes me great pain. It is an inherited environment. I take it blade
clusters have issues..??

[quoted text, click to view]
Geoff N. Hiten
3/27/2006 5:12:43 PM
Failover and failback settings are for the instance, not the node, as you
discovered.

Possible owners list is not sorted, I.E. the order does not matter.

Preferred owners list IS sorted by most preferred owner first.

If you were having problems with internet connectivity, the system may have
lost the IP ADDRESS resource long enough to start a failover or restart
process. Bad hardware can cause some wierd cluster instabilities.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP


[quoted text, click to view]

Geoff N. Hiten
3/27/2006 5:30:10 PM
Yes. Node 1 is running the SQL Virtual Instance. Node 2 then becomes the
"watchdog". Node 2 suddenly gets no response from Node 1. Node 2 then
tries to start the SQL virtual server but Node 1 will not release the disk
resources. Node 2 then sees Node 1 again with the Virtual Instance up and
running so it stops trying to take over. Node 2 does not know why it cannot
talk to Node 1, only that the other node is non-responsive. My guess is
that you lost both the crossover connection and the public connection at the
same time. This isn't a blade server based cluster, is it?

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP




[quoted text, click to view]

Geoff N. Hiten
3/27/2006 5:44:24 PM
The point of a cluster is to have redundant hardware as a hot spare.

The point of a blade is to have shared hardware to lower cost and space
usage

These are diametrically opposed goals. A blade server has too many common
points of failure for me to consider using one as a clustered host. You
don't get the increased availability you are expecting.

--
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP





[quoted text, click to view]

AddThis Social Bookmark Button