Groups | Blog | Home
all groups > sql server clustering > january 2005 >

sql server clustering : Clustering Service not starting right away on One Node


Thomas
1/25/2005 7:53:05 AM
I just set up a cluster attached to a SAN. I have had it where the cluster
service on one of the nodes doesn't start up right away. I have checked the
services to make sure that it is set to automatic which it is. Both nodes
are current with the latest patches and security updates. I'm a little
clueless as to why this is happening. Here is the real weird part after 1
minute the cluster service starts on the node that is giving me troubles.
Mike Epprecht (SQL MVP)
1/25/2005 8:13:03 AM
Hi

If you do a fail-over using cluster admin, are there any resources (that SQL
server depends on) take a long to come online?
I have seen similar issues when the devices take long to come online due to
high SAN activity.

Does the SQL Server resource come online, but take a long time until it has
done it's recovery steps?
This can occur when the other node was de-porting it's devices and still had
IO pending. This results in not all pages being fluished to SAN, so SQL
Server has to do more recovery on the database start-up.

The best guage of how quickly a resource comes online is to look at cluster
admin during the failover.

Regards
Mike
Regards
Mike

[quoted text, click to view]
Thomas
1/25/2005 8:27:03 AM
We haven't installed SQL server yet. I should've posted that first. But I
know from passed installs that SQL does take some time to come online. The
SAN doesn't have much activity on it right now

[quoted text, click to view]
Mike Epprecht (SQL MVP)
1/25/2005 8:49:10 AM
Hi

Have a look in your event logs and check the time differences between when
Node A shuts down and Node B notices it and starts up. There will be at least
15 event messages during this process. Post the information here so that I
can compare it to our big clusters.

Regards
Mike

[quoted text, click to view]
Thomas
1/25/2005 9:41:04 AM
Node 2 which I haven't seen the problem with having ownership of the cluster.
When I reboot node 1 is when I see the problem of it taking 1 mintue to
start the cluster service.

[quoted text, click to view]
Geoff N. Hiten
1/25/2005 11:49:27 AM
That may be somewhat normal on a simultaneous startup. The first node grabs
the quorum device and owns the cluster but isn't talking on the network yet.
The second node tries to get the device but times out. Eventually the
service comes online and talks to the other node and agrees on who is in
charge. This is especially prevalent on SCSI-based clusters.

Check the System and Application event logs on both systems to see if there
are any unusual startup errors. Also, check what happens when the second
node is rebooted. If the cluster service does come online quickly, it is
just a device contention issue. I try and avoid powering up more than one
cluster node at a time.


--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com

I support the Professional Association for SQL Server
www.sqlpass.org

[quoted text, click to view]

Geoff N. Hiten
1/25/2005 12:51:43 PM
Node order is arbitrary in a cluster. We could use Node X and Node Y
instead of Node 1 and Node 2.

Try manually stopping and starting the cluster service on Node 1. If it
restarts quickly, then the problem likely is one of the services that the
cluster service depends on. Time service is a usual suspect for that, but
you will have to check the entire list. Again, the Application and System
event logs are your friends here.

Now is the time to deal with this issue, not after you load SQL and get this
baby into production.

--
Geoff N. Hiten
Microsoft SQL Server MVP
Senior Database Administrator
Careerbuilder.com

I support the Professional Association for SQL Server
www.sqlpass.org

[quoted text, click to view]

AddThis Social Bookmark Button