Groups | Blog | Home
all groups > sql server clustering > december 2005 >

sql server clustering : Cluster synchronize


mukesh.thiru NO[at]SPAM gmail.com
12/28/2005 2:01:40 AM
I have an architecture which has 4 DB servers on the whole; two will
participate in load balancing and other two in fail over clustering.

My question is how will the two DB servers that is used for load
balancing synchronize?
Mike Epprecht (SQL MVP)
12/28/2005 1:00:00 PM
Hi

To synchronize the data between the 2 clusters, you need to use something
like replication. Clustering is not a load-balancing technology.

Regards
--------------------------------
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland

IM: mike@epprecht.net

MVP Program: http://www.microsoft.com/mvp

Blog: http://www.msmvps.com/epprecht/

[quoted text, click to view]

Michael Hotek
12/29/2005 6:23:26 PM
Clustering is also not compatible with Windows Load Balancing, so you would
need to use a hardware based load balancer.

--
Mike
Mentor
Solid Quality Learning
http://www.solidqualitylearning.com


[quoted text, click to view]

tmeshorer
12/30/2005 1:42:00 AM

[quoted text, click to view]

If you are using SQL server 2005, you might want to consider using read
only shared databases for your reporting . Using a shared file system,
all the nodes can see the same database files which are read only, and
mount the database as read only. You could run reporting quaries
against up to 16 nodes archiving close to 16X speedup.

If your update cycle and reporting cycle do not overlap, switching
between the read write to read only (and back) takes about 20 seconds.
Note that no replication is needed and no storage space is wasted.


Tomer Meshorer
Database Engineer
Polyserve Inc (http://www.polyserve.com)
Email:tmeshorer@polyserve.com
Michael Hotek
12/30/2005 6:56:21 AM
I'd very strongly disagree with that assessment. It isn't a shared file
system, it requires a SAN.

Additionally, just because you point 2 SQL Server instances at the same
database does not mean you get 2X performance improvement, just like
pointing 16 instances at a single database will not give you 16X
performance. You could in fact see your performance degrade. The only way
that your performance would improve is if you have queries running on each
instance that are not competing with each other for disk throughput. If
there is disk contention, your performance will degrade since a single set
of disks are now servicing requests from multiple instances at the same
time.

Read only, shared databases is a nice new feature, but any performance
benefit you may get is very highly dependent upon your query patterns, the
volume of data in your active working set, and the amount of physical data
you can cache in memory on each instance.

--
Mike
Mentor
Solid Quality Learning
http://www.solidqualitylearning.com


[quoted text, click to view]

tmeshorer
12/30/2005 10:55:25 PM
Your scenario is correct on a specific context. Vis you have an I/O
bottleneck
However, this is customer specific.
In cases where you have a CPU bottleneck (usually occurs in DW type of
quaries,which tends to do sequential I/O) the suggested solution will
scale close to linear(at least as observed in internal benchmarks where
I/O was not a bottleneck)

What I am suggesting, and specifically in the context of reporting, is
an alternative way for scaling SQL server.
Currently the only way to scale is up, which become very costly when
you go to the 8,16,32 CPU boxes.
In addition, you cannot scale economically since you move in increments
of 2^n each time you want to decrease your reporting time (and move
exponentially in price)
Note that in both scenarios (scale up to big SMP, or scale out on a
shared file system cluster) if you have an I/O bottleneck you reach the
state as described in your email.

However scaling out is more economically sound (assuming that you do
not have I/O bottle neck) since:

1. It is much cheaper
2. Pay as you go
3. No storage duplication
4. No changes to the database configuration(I.e. no need for
replication).

Best,

Tomer Meshorer
Database Engineer
Polyserve Inc
Michael Hotek
12/31/2005 1:26:45 PM
My only point was that you stated that you would get a 16X performance
improvement by using this when you deploy on 16 machines. That statement is
100% false and we both know it. You get ZERO performance improvement. Not
a single, solitary second of improved performance. A query running on the
operational system and taking 10 seconds is going to take 10 seconds to
complete if you are using a scalable shared database model.

Do not mix improved performance with the ability to execute more read only
queries per unit time. It is NOT the same thing and is an incredibly
misleading statement. It is misleading statements like that which lead to
people implementing things and then saying that "SQL Server can't scale" or
"SQL Server can't perform", because they are trying to make a feature do
something that it was not designed to do and never had the possibility of
doing in the first place.

You CAN be increased throughput and you CAN get increased capacity by being
able to run many more of these 10 second queries simultaneously when they
are executed against multiple machines. The ONLY time that you get
increased throughput (more queries per second) or increased capacity (more
queries per second) is if you hae a VERY specific situation where memory or
CPU is the ONLY bottleneck. In those situations, you can get increased
throughput or capacity, BUT, you do NOT and never will get a linear
increase in query capacity. The amount of increased capacity is VERY highly
dependent upon your database design, the volume of data you are working
with, and query patterns.

You cited DW type of queries. There is no such thing. You either have
queries that perform sequential I/O or queries performing random I/O.
Giving it a fancy name doesn't change it, because you run both types of
queries in every operational system. So, let's look at a CPU contrained
query. We have system where we have 4 processors and 4 GB of RAM. Queries
executing against the 400GB of data within the database normally take 15
minutes to execute and pull as much as 3 GB of data. We have isolated the
issue to CPU bottlenecks and moved the system to an 8 processor machine,
thereby doubling the CPU capacity. We observed that the execution time of
this query drops to 11 minutes whereby, we found that memory became the
bottleneck in the system. We doubled the RAM to 8GB and found that the
query execution only dropped to 10 minutes. Both memory and CPU became
bottlenecks at that point, because we were all of a sudden executing more
queries per unit time and 3 simultaneously executing queries would saturate
our memory. This would cause pages to be ejected from the cache (incurring
additional processor overhead). More pages would have to be read from disk
(more processor overhead). We quickly found out that it if we increased the
processor or memory capacity, we would then shift the bottleneck to the I/O
subsystem.

Sure, you can craft testing matrices and benchmarks that will say that
scalable, shared databases will get you a nearly linear scaling. (VERY
important to note that the word is SCALING, NOT performance improvement.)
For every scenario that you come up which exhibits this behavior, I can
change it a hundred different ways to cause the performance to degrade
either by changing the query pattern, introducing a new query, changing the
indexing, increasing the data volume, changing query parameters, etc.

It is a nice technology that has a VERY specific use in an environment
experiencing VERY specific scalability issues. Unless you have done a full
analysis on the system to determine whether it is appropriate, implementing
this technology would be a very big mistake. It most definitely isn't going
to produce a linear scaling and there is a reason that Microsoft is not
giving any specific numbers. (The instant they publish numbers like you
have, there are going to be several thousand people handing evidence that
the findings are not correct.)

This very specific issue is what makes all of the Oracle marketing and sales
surrounding RAC a complete joke. Oracle RAC won't produce a linear scaling
for exactly the same reasons that Scalable Shared Databases won't produce a
linear scaling. If your query pattern, active data volumes, and bottlenecks
meet a VERY specific profile then these types of technologies will get you
increased SCALABILITY. (That does NOT mean you get improved performance.)

--
Mike
Mentor
Solid Quality Learning
http://www.solidqualitylearning.com


[quoted text, click to view]

tmeshorer
1/3/2006 12:11:46 PM
I should have clarified my point. I am looking at the actual wall clock
time that it takes to run a reporting job. I also assume that you have
more queries than servers, which I think would be the case for a
typical reporting job.
So from a wall clock time perspective, you will be able to get close to
X#server execution time improvement. I.e. if you are CPU constrained, a
6 hour job with 6 queries on one server can be reduced to 1 hour job on
6 servers.

If you have an I/O bottleneck there are two cases. If you are storage
constrained that this is a natural case. Vis, it does not matter if you
have 4*2 way servers or one 8 way server. If you are internal I/O BUS
constrains (e.g. PCI Express) moving to 4*2 way might improve your
situation since now you have 4 internal buses in your disposal.

However, I do not agree with your other point that this is very
specific situation. I think that you will encounter this situation at
the same probability that you will encounter any other kind of
bottleneck (which is about 1/3 of the cases). However, as your example
describe, to move out of the "CPU as a bottleneck" state into "I/O as
bottleneck state" you have to scale up. Based on your example, your
customer had to scale up from 4 way to 8 way.
Now lets say that you paid the money and now you are I/O constrained.
Ok, you buy more disks, change your RAID method, etc. Time passes and
you are again constrained by the CPU... are you going to move to 16
way?, 32 way? 64 way ?

My point is that if you do not have CPU bottleneck now, you will be
after you solve your current bottleneck. I.e. this is not VERY specific
scenario but rather a common one.

Tomer Meshorer
Database Engineer
Polyserve Inc (www.polyserve.com)
Michael Hotek
1/3/2006 3:06:44 PM
So that means that I can put 9 women to work and produce a baby in 1 month?

--
Mike
Mentor
Solid Quality Learning
http://www.solidqualitylearning.com


[quoted text, click to view]

tmeshorer
1/3/2006 9:21:39 PM
No. What I suggest is "Concurrent " execution, I.e. wall time speed up.
However, I cannot formulate and answer using babies and women while
staying
politically correct :-)

Tomer Meshorer
Database Engineer
Polyserve Inc (www.polyserve.com)
mukesh.thiru NO[at]SPAM gmail.com
1/3/2006 11:22:09 PM
Hello Mike,
[quoted text, click to view]
scripts or do any manual task to synchronize two databases.
The proposed configuration is: Database Servers (4 for load balancing
and failover)
Now my question is how the data between different servers synchronized?
Michael Hotek
1/4/2006 3:58:21 AM
A cluster only has a single copy of a database in it. Doesn't matter how
many nodes exist within your cluster. So, there is no synchronization which
occurs, because there is only one copy of the database.

If you decide to make the database redundant, that would mean you have 2
instances of SQL Server running and you have a duplicate copy of the
database on a second instance. For this you use the following technologies
to keep the databases synchronized:
SQL Server 2000 - Log Shipping or Replication
SQL Server 2005 - Database Mirroring, Log Shipping, or Replication
This works exactly like it does on 2 instances of SQL Server installed on
stand-alone machines as it would with 2 instances of SQL Server installed in
the cluster.

--
Mike
Mentor
Solid Quality Learning
http://www.solidqualitylearning.com


[quoted text, click to view]

tmeshorer
1/4/2006 10:58:36 AM
I would be careful in the definition of redundancy, since the question
was "4 for load balancing
and failover". I want to emphasis the "failover" part.
If you use any other technology for High availability beside clustering
(e.g. MSCS or Polyserve HA)
what you end up for SQL2000 is manual failover. vis there is no
automatic monitoring and failover machanisim build into replication
or log shipping in SQL 2000.
In SQL 2005, there is an automatic failover mechanism build into
database mirroring, but
only in the case where transaction safety is set to FULL. I.e. each
transaction has to be committed on both
the primary and the backup before returning to the user.
I would suggest that you benchmark this configuration using your
typical load before deployment.
Note also that database mirroring in SQL2005 should be used for
evaluation purposes only.

Tomer Meshorer
Database Engineer
Polyserve Inc (www.polyserve.com)
mukesh.thiru NO[at]SPAM gmail.com
1/4/2006 10:17:29 PM
Hello Mike,

Is Log shipping or replication a manual process? What will be the time
delay for the two server instance to be synchronized?
In our case if the user inserts records into a table in server 1, when
will the server 2 get those data? Will the user if tries to connect to
server 2 at what time tag will he see the new records?

Since I am in a developer role in the project, I would like to know if
the replication etc. is my duty or DBA's.
Michael Hotek
1/5/2006 1:42:43 AM
Log shipping and replication are automatic processes in terms of sending
changes. Failing over from primary to secondary is a manual process of
detecting the system is down and a manual process of failing over.

Both methods get the changes to the secondary, when it gets there. No
sooner, no later. There is ZERO guaranteed delivery time. Why? Because
they rely on several components which SQL Server has no control over. The
backups for Log Shipping and the transactions for replication are both sent
across the network. If a network component is offline between the primary
and secondary, then changes are not going to make it there. Under "normal"
circumstances, you can keep the secondary for log shipping within about 2 -
4 minutes of the primary while with replication it is possible to reduce
this to 2 - 5 seconds.

With replication, the secondary is available and accessible to applications.
You do NOT want to make data changes there though. With Log Shipping, it
can be available, but in an availability architecture, the secondary is
normally not allowed to be in standby mode since transaction logs can not be
restored while users are connected.

As far as who's responsibility is it? It is everyone's responsibility. If
you DBA implements clustering and as a developer you have transactions in
your application that take a long time to execute, you can have a
significant enough impact on the failover time to render the cluster
irrelevant. If your application makes mass data changes on a frequent
basis, you can overwhelm the replication engine and increase the latency
significantly. 5% of availability is the database, OS, and hardware
technology that you deploy. The other 95% is business processes,
application architecture, management procedures, etc.

--
Mike
Mentor
Solid Quality Learning
http://www.solidqualitylearning.com


[quoted text, click to view]

AddThis Social Bookmark Button