all groups > sql server programming > october 2007 >
You're in the

sql server programming

group:

Identity Columns as Primary Keys


Identity Columns as Primary Keys wrytat
10/28/2007 9:33:00 PM
sql server programming: All the while, I've been using the usual column(s) with the usual datatype
(e.g. varchar, int, ..) as the primary key(s) of the tables I have. For
instance, I have a Customer table, then I will make the VARCHAR(4) CustID
column as the primary key.

Now, my colleague told me that using an identity column will make the
processing more efficient. So he recommended me to,
1. Keep the CustID column
2. Create a AutoID INT Identity column
3. Set the AutoID column as Primary Key
4. Set CustID as Unique Key

He says that by doing this the processing will be more efficient and at the
same time I can refer to the records with something more meaningful.

My question here is,
1. Why would using a Identity column as Primary Key be more efficient?
2. Because the datatype of a Identity column must be either int, smallint,
tinyint, decimal, or numeric, will it be possible that one day after the
largest number that can be represented is used, I'll have problem inserting
new record?

Re: Identity Columns as Primary Keys Uri Dimant
10/29/2007 12:00:00 AM
In addition to David's answer

A plus side is also that surrogate keys reduced concurrency problem. Foe
example if you open two windows and issue in the first one
SELECT <> FROM tbl WHERE Custid='bbb'
and in the second one you change the custid (I mean you use natural keys)
as UPDATE tbl SET Custid='aaaa' WHERE Custid='bbb'


To solve the problems we use surrogate keys. Almost we do not show them
tothe client as instead of update in the above example we just
UPDATE tbl SET Custid='aaaa' WHERE AutoID ='somenumber'



Just my two cents








[quoted text, click to view]

Re: Identity Columns as Primary Keys Michael MacGregor
10/29/2007 12:00:00 AM
I haven't seen this point mentioned yet in regard to the use of IDENTITY for
a clustered PK in systems where there is a lot of data being inserted and
that is the occurance of a hotspot at the point where new values are added.
Using a natural key such as CustID which may not be incremented sequentially
can avoid hotspots but can cause page splits when a new value has to be
inserted on an index page with no space left. This can be minimised by
judicial use of FILLFACTOR and keeping an eye on index fragmentation.

Michael MacGregor
Database Architect

Re: Identity Columns as Primary Keys Dan Guzman
10/29/2007 7:09:55 AM
[quoted text, click to view]

To add on to the other responses, INSERTs are more efficient when the
indexed value continually increases, especially with a clustered index.

Personally, I prefer to use the natural key as the primary key unless
surrogate key benefits (reduced storage, less join complexity, better insert
performance) outweigh the costs (extra index storage and I/O overhead, more
complex indexing strategy, bending relational theory).

--
Hope this helps.

Dan Guzman
SQL Server MVP

[quoted text, click to view]
Re: Identity Columns as Primary Keys David Portas
10/29/2007 7:40:41 AM
[quoted text, click to view]


1. It may or may not prove more efficient depending on how it is to be used.
On the plus side, IDENTITY may reduce the size of the key used for joins and
eliminate the potential problem of cascading updates to foreign keys.
Possible disadvantages are that it usually increases the number of joins
required (because your referencing tables no longer contain the natural key
of the parent table).

2. Make sure you choose a datatype large enough that that will never happen.
Integer allows more than 4 billion values for example, which is more than
enough for many applications.

Automatically adding IDENTITY to every table is very unwise in my opinion,
given the complexities and limitations of this feature. The only way to be
sure what will perform for you is to test it out.

--
David Portas

Re: Identity Columns as Primary Keys wrytat
10/29/2007 6:03:00 PM
Thank you for all the reply. In other words, using identity can really
improves the performance in some situation. But I think I'll still stick with
natural key because I'm not familiar with identity, and I don't think using
identity in my situation will have much effect. But, regarding Binary
Collation, how do I do it? Do I just need to go to column, change its
collation to Windows Collation, Binary Sort? And for the Language should I
choose Latin1_General if I'm using English only?

[quoted text, click to view]
Re: Identity Columns as Primary Keys Gert-Jan Strik
10/29/2007 8:26:26 PM
Ad. 1
-----
A VARCHAR(4) Primary Key does not sound very smart to me. For such a
short string, it will require less space if you simply define it as
CHAR(4).

When it comes to performance, joining/sorting on an INT will be faster
than on a CHAR or VARCHAR. If you use binary collation for your
CHAR/VARCHAR, then the performance of INT will only be slightly better.

Ad. 2
-----
You can reseed (reset) an Identity. This will only be useful if there is
a sufficient range of unused numbers, because otherwise you will get
Primary Key violations (duplicates).

BTW: you can also use BIGINT, so if the 4 billions values of an INT is
not enough (in which case the VARCHAR(4) would not be enough either),
then you could use BIGINT. However, you can expect a BIGINT Identity to
perform poorer than a CHAR(4) with binary collation, because of its
larger space requirement (8 versus 4).

IOW, I would only change VARCHAR(4) with default collation to CHAR(4)
binary collation and leave it at that...

--
Gert-Jan


[quoted text, click to view]
Re: Identity Columns as Primary Keys Dan Guzman
10/29/2007 10:41:27 PM
And note that insert hotspots in modern SQL Server versions are often good
for performance rather than something to be avoided.

--
Hope this helps.

Dan Guzman
SQL Server MVP

[quoted text, click to view]
Re: Identity Columns as Primary Keys Alex Kuznetsov
10/30/2007 6:57:40 AM
On Oct 29, 2:26 pm, Gert-Jan Strik <so...@toomuchspamalready.nl>
[quoted text, click to view]

If the key is less than 4 chars and needs to be padded with blanks,
there have been plenty of issues with different clients in the past.
Some "smart" clients truncate trailing spaces and that can be a pain
in the neck, especially when after you have upgraded some library some
functionality is broken. Better to have a VARCHAR and a piece of mind.
Re: Identity Columns as Primary Keys Michael MacGregor
10/30/2007 9:28:51 AM
[quoted text, click to view]
Really? Why?

MTM

Re: Identity Columns as Primary Keys Gert-Jan Strik
10/30/2007 7:58:13 PM
[quoted text, click to view]

I guess I don't see the problem, and I don't know what you mean with
libraries that would suddenly break.

As you know, this is not a problem when used in a query, because of ANSI
behavior. New applications are not a problem either. If you have
existing applications, and you are worried about this, then this can
easily be solved by creating a view with a RTRIMmed version of it.

--
Re: Identity Columns as Primary Keys Hugo Kornelis
10/30/2007 9:26:14 PM
[quoted text, click to view]

Hi Michael,

Mostly because it reduces the amount of page splits required while
inserting data, and reduces fragmentation.

--
Hugo Kornelis, SQL Server MVP
Re: Identity Columns as Primary Keys Dan Guzman
10/30/2007 10:02:01 PM
As Hugo, mentioned, splits and fragmentation are reduced with an increasing
key. Many rows can be stored with a single page write.

--
Hope this helps.

Dan Guzman
SQL Server MVP

[quoted text, click to view]
Re: Identity Columns as Primary Keys Michael MacGregor
11/1/2007 10:03:32 AM
I thought perhaps you were referring to something new that I did not know
about, but thanks anyway for the clarification.

Michael MacGregor
Database Architect

AddThis Social Bookmark Button