Groups | Blog | Home
all groups > sql server programming > february 2007 >

sql server programming : why should there be only one clustered index in a table?


Shocky
2/23/2007 11:07:26 PM
why should there be only one clustered index in a table?
Tibor Karaszi
2/24/2007 12:00:00 AM
The clustered index sorts the *data*. You can only sort that data one way, so you can only have one
clustered index.

--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi


[quoted text, click to view]
Tibor Karaszi
2/24/2007 12:00:00 AM
Did you read the index chapters in Books Online. They explain the index tree and that for a
clustered index, the leaf level of the index *is* the data. Here's a SQL Server 2005 Books Online
URL describing a clustered index:

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/26b28045-c3c2-465a-b564-bf2189e93fdc.htm


--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi


[quoted text, click to view]
Shocky
2/24/2007 12:05:46 AM

[quoted text, click to view]


Am still not clear
Claudia
2/24/2007 2:13:05 AM
Our company had a client-project that contained a double-cluster, but their
situation was quite different (we have not found a reason to re-duplicate
their solution).

My old SQL 2000 cert software demonstrated this. Why they didn't show two
unique indexes together.. hm.

My apologies in advance, I use the "Sandwich" analogy a lot.

If meat and bread were clustered together:
All these would be distinct:
tuna and wheat
tuna and white
tuna and frenchbread

Let's say if every option was filled out in the table, the indexes would
display itself as if it was a 'cross join'.

Regards,
C-
[quoted text, click to view]
--CELKO--
2/24/2007 5:48:38 AM
[quoted text, click to view]

SQL Server is still based on physically contigous sequential storage.
The clusterd index is based on physically sequential storage ordering,
and there can be only one sort order. How many ways can you sort a
magnetic tape?

Kalen Delaney
2/24/2007 8:38:45 AM
Hi Joe

No, this has been discussed here many times. The clustered index does not
store the data PHYSICALLY sorted, it is only LOGICALLY sorted..

--
HTH
Kalen Delaney, SQL Server MVP
http://sqlblog.com


[quoted text, click to view]

Dan Guzman
2/24/2007 10:08:27 AM
The Books Online Tibor referenced has a good architectural description of
clustered indexes. Basically, clustered index leaf nodes are the actual
table data rows and are ordered on each page by the index key. You can have
only one clustered index on a table because data rows can be ordered in only
one way at a time.

If you also create a non-clustered index or indexed view containing all
table columns, you'll effectively get the same behavior as multiple
clustered indexes. However, this is not commonly done because of the
overhead of duplicating the entire table.


--
Hope this helps.

Dan Guzman
SQL Server MVP

[quoted text, click to view]
David Portas
2/24/2007 1:11:18 PM
On 24 Feb, 16:38, "Kalen Delaney" <replies@public_newsgroups.com>
[quoted text, click to view]

A fact which demonstrates that the commonly quoted excuse for having
only one clustered index ("that the data can only be sorted one way")
is bogus. Obviously any number of logical sorts are possible and the
same data is in practice recorded in multiple places in different
indexes.

I imagine that the explanation for allowing just one clustered index
is that SQL Server uses cluster keys as bookmarks. I suppose that
adding the complexity to allow several different keys for bookmarks
would bring no benefit other than the possibilities of covering more
queries - something that can be achieved anyway through a non-
clustered index. That seems to make sense to me anyway.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Kalen Delaney
2/24/2007 2:11:04 PM
The data is sorted according to the clustered index, it is just not stored
physically in that sorted order. Joe repeatedly makes claims to the way SQL
Server physically stored the data that are just not true at all.

Whether it is physical or logical, the data in the table (which is the leaf
level of the clustered index) can only be sorted one way with the current
architecture. Each page can only point to one 'next' page, and the chain of
pages linked together is the logically sorted table.

--
HTH
Kalen Delaney, SQL Server MVP
http://sqlblog.com


[quoted text, click to view]

Kalen Delaney
2/24/2007 3:52:09 PM
[quoted text, click to view]

Yes, that is why I said
[quoted text, click to view]
architecture


--
HTH
Kalen Delaney, SQL Server MVP
http://sqlblog.com


[quoted text, click to view]

Erland Sommarskog
2/24/2007 4:25:04 PM
Shocky (yusuf.bhiwandiwala@gmail.com) writes:
[quoted text, click to view]

There can only be one clustered index in a table, because the leaf node of
the index is the data pages.

However, you can define any number of non-clustered index which includs
all columns in the table, which would mean that in practice you would have
more than one clustered index. But note that this mean that the data is
more than one place, and space consumption rises as a consequence.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
David Portas
2/24/2007 4:29:19 PM
[quoted text, click to view]

Actually I hadn't thought of that Erland.

The data in a single index can only be sorted one way in the current
architecture. But, that same data can appear in multiple indexes and
there could easily be several indexes covering the total set of
columns in the table. Those columns could be added at leaf level using
the INCLUDE clause or they could be part of the index key itself.

So in my view "sorted one way" isn't an adequate explanation of why
only one clustered index is permitted and I'm not surprised that it
baffles some people.

It seems like a better explanation is the need for SQL Server to
identify each data row uniquely in every index. A clustered index
causes the index key to be substituted in place of the RIDs that are
otherwise used in nonclustered indexes. There is no benefit I can
think of to having more than one such key, therefore only one
clustered index is necessary. Unless this subtle distinction is
understood it seems quite reasonable for the OP to speculate about
multiple clustered indexes, each one containing the same set of data
at leaf level - as Gert-Jan points out, to some extent that is what
the INCLUDE keyword achieves.

Let's be careful to distinguish between a Table, which is part of the
database's Logical Model, and an Index, which is part of the physical
implementation of that model. Hopefully we will agree that the phrase
"logically sorted table" is a contradiction in terms!

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Erland Sommarskog
2/24/2007 11:38:50 PM
Kalen Delaney (replies@public_newsgroups.com) writes:
[quoted text, click to view]

I guess David is suggesting that you could have more than one set of link
pointers per page. He discusses the issue of the clustered index keys being
row locators for the NC index, but that would be addressed with one of the
clustered indexes being promoted to primary clustered index.

Obviously, in such a solution there would still be a low number of possible
clustered indexes. (And nevermind that the current space architecture does
not have any space for them.)

It would not be possible to have more han one of these multiple clustered
index to be contiguous, though.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Erland Sommarskog
2/25/2007 12:00:00 AM
David Portas (REMOVE_BEFORE_REPLYING_dportas@acm.org) writes:
[quoted text, click to view]

Unfortunately, I think that explanation has to count as an after-
construction. The fact that the clustered key is the is the row locator
for the NC indexes is a novelty added in SQL 7. And it was not possible
to have multiple clustered indexes in SQL 6.5 or earlier either.

To have multiple clustered indexes, you need to have either
1) duplicated data pages (which can be achieved with all-inclusive
NC indexes or indexed views)
2) multiple next/prev pointers on page level.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Gert-Jan Strik
2/25/2007 12:51:21 AM
[quoted text, click to view]

Before SQL Server 2005, the feature that was mentioned most as being
special for the clustered index, was the fact that the leaf level of the
clustered index will always contain the table data. And since creating
an index of all columns of the table was not considered equal (or simply
not possible), this was a big difference.

But as you all know, SQL Server 2005 supports the INCLUDE keyword, which
makes it possible to include all non-index columns to the leaf level of
a nonclustered index. A clustered index has the property that the data
will be sorted based on the index order. A nonclustered index with all
non-indexed columns included will do exactly the same thing. So that
property is no long unique to a clustered index.

So although SQL Server will still only allow you create one clustered
index using the CLUSTERED keyword, for all practical purposes, you can
add as many as you want (up to a few hundred). You just have to omit the
CLUSTERED keyword, use the INCLUDE feature and you will have an index
where the data is clustered on the index order.

Kalen Delaney
2/25/2007 10:23:16 AM
Yes, I agree that the terms clustered and nonclustered can be very
confusing. Also, when you start looking deeper into the way data structures
are kept track of in SQL Server 2005, the differentiation between heaps and
b-trees as the two types of possible organization becomes clearer.

Note however, that a PRIMARY KEY CONSTRAINT does not have to imply the table
is organized as a B-TREE. You can specify as PRIMARY KEY as nonclustered, or
if you already have a clustered index, and you ALTER the table to define a
PRIMARY KEY, it will be nonclustered.

Also, I may be overlooking something, but you don't need SQL 2005 to have
additional indexes that contain all the table's columns, IF the table has
fewer than 16 columns and less than 900 bytes per row.

--
HTH
Kalen Delaney, SQL Server MVP
http://sqlblog.com


[quoted text, click to view]

David Portas
2/25/2007 11:35:43 AM
[quoted text, click to view]

Already I don't like your syntax. The failure to separate
"implementation language" from DDL is one of the worst offences of
most SQL DBMSs (Oracle and Microsoft anyway). SQL was not intended to
be used in that way. So many problems and limitations stem from that
one simple design error.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
David Portas
2/25/2007 12:34:22 PM
[quoted text, click to view]

FILLFACTOR and ON filegroup are other equally bad transgressions IMO.

[quoted text, click to view]

I agree that CLUSTERED / NONCLUSTERED ought to be deprecated. I think
you've summarized the problem very well: we need a new syntax that
better defines heap vs B-tree and the other index attributes. I'd just
prefer to see that syntax kept separate from constraint and table
definitions.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Steve Kass
2/25/2007 1:06:43 PM
The use of the word CLUSTERED causes all kinds of confusion.

Tables are organized in one of two ways: B-TREE or HEAP. This
is determined indirectly. If one index on the table is created with the
CLUSTERED keyword, or if a PRIMARY KEY constraint is present
in the CREATE TABLE statement, a table is organized as a B-TREE.
Otherwise, the table is organized as a HEAP.

B-TREE organization requires that some sequence of columns be
identified as the key columns of the B-tree, and the mechanism
SQL Server provides for identifying these columns is in the list of
key columns of the (sole) clustered index on the table. There can
be only one specification of the way the B-tree is organized, therefore
only one index can be designated as CLUSTERED.

The purpose of the keyword CLUSTERED, therefore, is to declare that
a) the table is to be organized as a B-tree, and b) to specify the key
columns
for the B-tree. Thus the reason there can be only one clustered index on
a table is because a B-tree cannot be ordered in multiple ways.

When a table is organized as a B-tree, the B-tree that serves as the
primary store of the table data is referred to as the clustered index
of the table. All indexes have B-tree structure, but only one ("the"
clustered index) can have the following two properties 1) Contains
all columns, i.e., all the table's data; 2) has as its key columns the
columns whose values are used as row locators in all indexes.

In SQL Server 2005, it is possible to create more than one B-trees
with property 1) above via CREATE INDEX .. INCLUDE. It is
impossible, however, to have two different B-trees each having
property 2) above, because indexes only have room to store one
row locator. The additional "complete table" B-trees that satisfy
property 1) might be called additional clustered indexes, since they
function in many ways as "the" clustered index of a table.

If I could invent the syntax, I would not use CLUSTERED/NONCLUSTERED.
I would require a table's organization to be specified when it is
created, and then indexes would just be indexes:

CREATE TABLE T (
....
) AS HEAP

CREATE TABLE X(
....
) AS BTREE(ColA,ColB)

-- Steve Kass
-- Drew University
-- http://www.stevekass.com
-- A1D8F830-B056-48DC-9A20-FB4931E7B380

[quoted text, click to view]
Steve Kass
2/25/2007 1:49:04 PM


[quoted text, click to view]
Thanks. That's right - the only use of PRIMARY KEY that forces B-tree
organization


[quoted text, click to view]
True. I didn't say what I was thinking, which was that only in 2005 can
you create
a second B-tree that really works like a clustered index for retrieval.
With INCLUDE,
you can not only make sure the index contains all columns, but you can do so
without having to make all the columns key columns, doing which causes
the index
to be unnecessarily bloated above the leaf level. This is a valuable
property of
"clustered indexes" that you only get once per table before 2005, and I
should have
mentioned it as the property that distinguishes 2005 from 2000 here (I
only mentioned
the "all columns" property, which is not the distinctive one.). In 2005,
INCLUDE
provides this ability to organize any index this way: as a B-tree based on
a subset of columns. As a result, in 2005 one more thing we used to only
get with CLUSTERED is now available for any index.

Thanks for pointing this out.

JXStern
2/25/2007 2:10:12 PM
On 24 Feb 2007 05:48:38 -0800, "--CELKO--" <jcelko212@earthlink.net>
[quoted text, click to view]

Strictly speaking, this isn't even close to accurate. You don't have
to have any clustered index on a table, in which case they are just
heaps in logically sequential storage, which has no relation to
physically contiguous storage on any normal disk volume. Within a
filegroup records can be interleaved with other tables' records and
with indexes, and using anything beyond a single filegroup is
relatively uncommon in practice.

Apparently SQL2000/2005 supports raw partitions, but I have never seen
one used, and they are certainly not required, and I am not aware that
SQLServer uses their contiguous/sequential nature to any advantage.

Josh

JXStern
2/25/2007 2:11:35 PM
On 23 Feb 2007 23:07:26 -0800, "Shocky" <yusuf.bhiwandiwala@gmail.com>
[quoted text, click to view]

is there any reason why you would like more, or are you just asking?

J.
Steve Kass
2/25/2007 3:15:39 PM
David,

I'm suggesting that the decision to represent a table as a heap or
as a B-tree is an important decision that should not be addressed
indirectly in the language. It could be addressed directly (as I
suggested) or not in DDL at all (or simply not at all, period).

I just think that the notion of CLUSTERED has caused a huge
amount of confusion. It makes it harder to explain what an index
is. Right now, An index may or may not include some or all of the
column values for columns unnamed when the index is created. If
the index is clustered, it includes all. If the index is nonclustered,
what information the index includes depends on whether or not
the table is stored as a heap or not, and if not, what the clustered
index or clustered primary key column list is.

It sure would be easier to say that an index contains
1) the columns you list as key columns when you create it (keyed),
2) plus any columns you INCLUDE when you create it (nonkeyed),
3) plus the row locator
4) plus nothing else

Right now, what's in an index is hard to explain, and it can
change if you add or drop a different index. I agree that the
whole notion of indexes should best be separate from DDL,
but this issue of heap vs. b-tree is currently in limbo, which
is worse.

This would by no means be the first implementation detail
to be part of CREATE TABLE (not that I'm wedded to this
particular syntax, either). What about FILLFACTOR,
ON filegroup, etc.?

What I'd really like to see is for the word CLUSTERED to
go away (if not deprecated, be never needed), and that it
be completely clear what table columns are in an index from
the CREATE INDEX and CREATE TABLE statements. Is that
something you'd agree with, and if so, what suggestions
do you have?

SK


[quoted text, click to view]
Steve Kass
2/25/2007 3:18:34 PM
Another thing: if we didn't have this CLUSTERED word, maybe no one
would complain when they didn't get their unORDERed results in "clustered
index order".

SK

[quoted text, click to view]
Tom Cooper
2/25/2007 3:53:29 PM

[quoted text, click to view]
<snip>

[quoted text, click to view]

Text, ntext and image can't be included using the INCLUDE clause. But
varchar(max), nvarchar(max) and varbinary(max) can (at least according to
BOL, I admit I've never tried it).

Tom
<snip>

Gert-Jan Strik
2/25/2007 8:17:06 PM
Steve,

No disrespect, but I don't see the difference, whether you specify heap
or B-tree during table creation or afterwards using index creation. With
the current architecture, the ramifications are the same. One thing is
not entirely clear though. Your syntax for specifying a table as B-tree
would not allow a non-unique set of columns, would it? That seems a
major drawback!

As for your property 2: a nonclustered index that includes all other
columns does not need a row locator to the data in the heap/clustered
index. Obviously, it is not very space efficient and it will have
additional cost (overhead), but you can even create a heap with a
nonclustered index that includes all columns. This creates both a heap
*and* a B-tree organised table.

So in the end, I would describe the clustered index as the index that
includes the data at the leaf level "for free", so without data
duplication and without additional data management overhead.

Gert-Jan

P.S. Of course, earlier I should have posted the disclaimer that some
data types cannot be included using the INCLUDE clause of CREATE INDEX
command, such as text and varchar(max)


[quoted text, click to view]
AddThis Social Bookmark Button