sql server programming:
All the while, I've been using the usual column(s) with the usual datatype (e.g. varchar, int, ..) as the primary key(s) of the tables I have. For instance, I have a Customer table, then I will make the VARCHAR(4) CustID column as the primary key. Now, my colleague told me that using an identity column will make the processing more efficient. So he recommended me to, 1. Keep the CustID column 2. Create a AutoID INT Identity column 3. Set the AutoID column as Primary Key 4. Set CustID as Unique Key He says that by doing this the processing will be more efficient and at the same time I can refer to the records with something more meaningful. My question here is, 1. Why would using a Identity column as Primary Key be more efficient? 2. Because the datatype of a Identity column must be either int, smallint, tinyint, decimal, or numeric, will it be possible that one day after the largest number that can be represented is used, I'll have problem inserting new record?
In addition to David's answer A plus side is also that surrogate keys reduced concurrency problem. Foe example if you open two windows and issue in the first one SELECT <> FROM tbl WHERE Custid='bbb' and in the second one you change the custid (I mean you use natural keys) as UPDATE tbl SET Custid='aaaa' WHERE Custid='bbb' To solve the problems we use surrogate keys. Almost we do not show them tothe client as instead of update in the above example we just UPDATE tbl SET Custid='aaaa' WHERE AutoID ='somenumber' Just my two cents [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:gqqdnT9Rh5QfE7janZ2dnUVZ8qClnZ2d@giganews.com... > "wrytat" <wrytat@discussions.microsoft.com> wrote in message > news:57B4326F-EE7C-42BF-9DC7-A1BCD23F80AA@microsoft.com... >> All the while, I've been using the usual column(s) with the usual >> datatype >> (e.g. varchar, int, ..) as the primary key(s) of the tables I have. For >> instance, I have a Customer table, then I will make the VARCHAR(4) CustID >> column as the primary key. >> >> Now, my colleague told me that using an identity column will make the >> processing more efficient. So he recommended me to, >> 1. Keep the CustID column >> 2. Create a AutoID INT Identity column >> 3. Set the AutoID column as Primary Key >> 4. Set CustID as Unique Key >> >> He says that by doing this the processing will be more efficient and at >> the >> same time I can refer to the records with something more meaningful. >> >> My question here is, >> 1. Why would using a Identity column as Primary Key be more efficient? >> 2. Because the datatype of a Identity column must be either int, >> smallint, >> tinyint, decimal, or numeric, will it be possible that one day after the >> largest number that can be represented is used, I'll have problem >> inserting >> new record? >> >> Please enlighten and advise. Thank you. > > > 1. It may or may not prove more efficient depending on how it is to be > used. On the plus side, IDENTITY may reduce the size of the key used for > joins and eliminate the potential problem of cascading updates to foreign > keys. Possible disadvantages are that it usually increases the number of > joins required (because your referencing tables no longer contain the > natural key of the parent table). > > 2. Make sure you choose a datatype large enough that that will never > happen. Integer allows more than 4 billion values for example, which is > more than enough for many applications. > > Automatically adding IDENTITY to every table is very unwise in my opinion, > given the complexities and limitations of this feature. The only way to be > sure what will perform for you is to test it out. > > -- > David Portas > >
I haven't seen this point mentioned yet in regard to the use of IDENTITY for a clustered PK in systems where there is a lot of data being inserted and that is the occurance of a hotspot at the point where new values are added. Using a natural key such as CustID which may not be incremented sequentially can avoid hotspots but can cause page splits when a new value has to be inserted on an index page with no space left. This can be minimised by judicial use of FILLFACTOR and keeping an eye on index fragmentation. Michael MacGregor Database Architect
[quoted text, click to view] > 1. Why would using a Identity column as Primary Key be more efficient?
To add on to the other responses, INSERTs are more efficient when the indexed value continually increases, especially with a clustered index. Personally, I prefer to use the natural key as the primary key unless surrogate key benefits (reduced storage, less join complexity, better insert performance) outweigh the costs (extra index storage and I/O overhead, more complex indexing strategy, bending relational theory). -- Hope this helps. Dan Guzman SQL Server MVP [quoted text, click to view] "wrytat" <wrytat@discussions.microsoft.com> wrote in message news:57B4326F-EE7C-42BF-9DC7-A1BCD23F80AA@microsoft.com... > All the while, I've been using the usual column(s) with the usual datatype > (e.g. varchar, int, ..) as the primary key(s) of the tables I have. For > instance, I have a Customer table, then I will make the VARCHAR(4) CustID > column as the primary key. > > Now, my colleague told me that using an identity column will make the > processing more efficient. So he recommended me to, > 1. Keep the CustID column > 2. Create a AutoID INT Identity column > 3. Set the AutoID column as Primary Key > 4. Set CustID as Unique Key > > He says that by doing this the processing will be more efficient and at > the > same time I can refer to the records with something more meaningful. > > My question here is, > 1. Why would using a Identity column as Primary Key be more efficient? > 2. Because the datatype of a Identity column must be either int, smallint, > tinyint, decimal, or numeric, will it be possible that one day after the > largest number that can be represented is used, I'll have problem > inserting > new record? > > Please enlighten and advise. Thank you.
[quoted text, click to view] "wrytat" <wrytat@discussions.microsoft.com> wrote in message news:57B4326F-EE7C-42BF-9DC7-A1BCD23F80AA@microsoft.com... > All the while, I've been using the usual column(s) with the usual datatype > (e.g. varchar, int, ..) as the primary key(s) of the tables I have. For > instance, I have a Customer table, then I will make the VARCHAR(4) CustID > column as the primary key. > > Now, my colleague told me that using an identity column will make the > processing more efficient. So he recommended me to, > 1. Keep the CustID column > 2. Create a AutoID INT Identity column > 3. Set the AutoID column as Primary Key > 4. Set CustID as Unique Key > > He says that by doing this the processing will be more efficient and at > the > same time I can refer to the records with something more meaningful. > > My question here is, > 1. Why would using a Identity column as Primary Key be more efficient? > 2. Because the datatype of a Identity column must be either int, smallint, > tinyint, decimal, or numeric, will it be possible that one day after the > largest number that can be represented is used, I'll have problem > inserting > new record? > > Please enlighten and advise. Thank you.
1. It may or may not prove more efficient depending on how it is to be used. On the plus side, IDENTITY may reduce the size of the key used for joins and eliminate the potential problem of cascading updates to foreign keys. Possible disadvantages are that it usually increases the number of joins required (because your referencing tables no longer contain the natural key of the parent table). 2. Make sure you choose a datatype large enough that that will never happen. Integer allows more than 4 billion values for example, which is more than enough for many applications. Automatically adding IDENTITY to every table is very unwise in my opinion, given the complexities and limitations of this feature. The only way to be sure what will perform for you is to test it out. -- David Portas
Thank you for all the reply. In other words, using identity can really improves the performance in some situation. But I think I'll still stick with natural key because I'm not familiar with identity, and I don't think using identity in my situation will have much effect. But, regarding Binary Collation, how do I do it? Do I just need to go to column, change its collation to Windows Collation, Binary Sort? And for the Language should I choose Latin1_General if I'm using English only? [quoted text, click to view] "Gert-Jan Strik" wrote: > Ad. 1 > ----- > A VARCHAR(4) Primary Key does not sound very smart to me. For such a > short string, it will require less space if you simply define it as > CHAR(4). > > When it comes to performance, joining/sorting on an INT will be faster > than on a CHAR or VARCHAR. If you use binary collation for your > CHAR/VARCHAR, then the performance of INT will only be slightly better. > > Ad. 2 > ----- > You can reseed (reset) an Identity. This will only be useful if there is > a sufficient range of unused numbers, because otherwise you will get > Primary Key violations (duplicates). > > BTW: you can also use BIGINT, so if the 4 billions values of an INT is > not enough (in which case the VARCHAR(4) would not be enough either), > then you could use BIGINT. However, you can expect a BIGINT Identity to > perform poorer than a CHAR(4) with binary collation, because of its > larger space requirement (8 versus 4). > > IOW, I would only change VARCHAR(4) with default collation to CHAR(4) > binary collation and leave it at that... > > -- > Gert-Jan > > > wrytat wrote: > > > > All the while, I've been using the usual column(s) with the usual datatype > > (e.g. varchar, int, ..) as the primary key(s) of the tables I have. For > > instance, I have a Customer table, then I will make the VARCHAR(4) CustID > > column as the primary key. > > > > Now, my colleague told me that using an identity column will make the > > processing more efficient. So he recommended me to, > > 1. Keep the CustID column > > 2. Create a AutoID INT Identity column > > 3. Set the AutoID column as Primary Key > > 4. Set CustID as Unique Key > > > > He says that by doing this the processing will be more efficient and at the > > same time I can refer to the records with something more meaningful. > > > > My question here is, > > 1. Why would using a Identity column as Primary Key be more efficient? > > 2. Because the datatype of a Identity column must be either int, smallint, > > tinyint, decimal, or numeric, will it be possible that one day after the > > largest number that can be represented is used, I'll have problem inserting > > new record? > > > > Please enlighten and advise. Thank you.
Ad. 1 ----- A VARCHAR(4) Primary Key does not sound very smart to me. For such a short string, it will require less space if you simply define it as CHAR(4). When it comes to performance, joining/sorting on an INT will be faster than on a CHAR or VARCHAR. If you use binary collation for your CHAR/VARCHAR, then the performance of INT will only be slightly better. Ad. 2 ----- You can reseed (reset) an Identity. This will only be useful if there is a sufficient range of unused numbers, because otherwise you will get Primary Key violations (duplicates). BTW: you can also use BIGINT, so if the 4 billions values of an INT is not enough (in which case the VARCHAR(4) would not be enough either), then you could use BIGINT. However, you can expect a BIGINT Identity to perform poorer than a CHAR(4) with binary collation, because of its larger space requirement (8 versus 4). IOW, I would only change VARCHAR(4) with default collation to CHAR(4) binary collation and leave it at that... -- Gert-Jan [quoted text, click to view] wrytat wrote: > > All the while, I've been using the usual column(s) with the usual datatype > (e.g. varchar, int, ..) as the primary key(s) of the tables I have. For > instance, I have a Customer table, then I will make the VARCHAR(4) CustID > column as the primary key. > > Now, my colleague told me that using an identity column will make the > processing more efficient. So he recommended me to, > 1. Keep the CustID column > 2. Create a AutoID INT Identity column > 3. Set the AutoID column as Primary Key > 4. Set CustID as Unique Key > > He says that by doing this the processing will be more efficient and at the > same time I can refer to the records with something more meaningful. > > My question here is, > 1. Why would using a Identity column as Primary Key be more efficient? > 2. Because the datatype of a Identity column must be either int, smallint, > tinyint, decimal, or numeric, will it be possible that one day after the > largest number that can be represented is used, I'll have problem inserting > new record? >
And note that insert hotspots in modern SQL Server versions are often good for performance rather than something to be avoided. -- Hope this helps. Dan Guzman SQL Server MVP [quoted text, click to view] "Michael MacGregor" <macnoknifespam@noemailspam.com> wrote in message news:eb1PgYjGIHA.4684@TK2MSFTNGP06.phx.gbl... >I haven't seen this point mentioned yet in regard to the use of IDENTITY >for a clustered PK in systems where there is a lot of data being inserted >and that is the occurance of a hotspot at the point where new values are >added. Using a natural key such as CustID which may not be incremented >sequentially can avoid hotspots but can cause page splits when a new value >has to be inserted on an index page with no space left. This can be >minimised by judicial use of FILLFACTOR and keeping an eye on index >fragmentation. > > Michael MacGregor > Database Architect >
On Oct 29, 2:26 pm, Gert-Jan Strik <so...@toomuchspamalready.nl> [quoted text, click to view] wrote: > Ad. 1 > ----- > A VARCHAR(4) Primary Key does not sound very smart to me. For such a > short string, it will require less space if you simply define it as > CHAR(4).
If the key is less than 4 chars and needs to be padded with blanks, there have been plenty of issues with different clients in the past. Some "smart" clients truncate trailing spaces and that can be a pain in the neck, especially when after you have upgraded some library some functionality is broken. Better to have a VARCHAR and a piece of mind.
[quoted text, click to view] > And note that insert hotspots in modern SQL Server versions are often good > for performance rather than something to be avoided.
Really? Why? MTM
[quoted text, click to view] Alex Kuznetsov wrote: > > On Oct 29, 2:26 pm, Gert-Jan Strik <so...@toomuchspamalready.nl> > wrote: > > Ad. 1 > > ----- > > A VARCHAR(4) Primary Key does not sound very smart to me. For such a > > short string, it will require less space if you simply define it as > > CHAR(4). > > If the key is less than 4 chars and needs to be padded with blanks, > there have been plenty of issues with different clients in the past. > Some "smart" clients truncate trailing spaces and that can be a pain > in the neck, especially when after you have upgraded some library some > functionality is broken. Better to have a VARCHAR and a piece of mind.
I guess I don't see the problem, and I don't know what you mean with libraries that would suddenly break. As you know, this is not a problem when used in a query, because of ANSI behavior. New applications are not a problem either. If you have existing applications, and you are worried about this, then this can easily be solved by creating a view with a RTRIMmed version of it. --
[quoted text, click to view] On Tue, 30 Oct 2007 09:28:51 -0400, Michael MacGregor wrote: >> And note that insert hotspots in modern SQL Server versions are often good >> for performance rather than something to be avoided. >Really? Why?
Hi Michael, Mostly because it reduces the amount of page splits required while inserting data, and reduces fragmentation. -- Hugo Kornelis, SQL Server MVP
As Hugo, mentioned, splits and fragmentation are reduced with an increasing key. Many rows can be stored with a single page write. -- Hope this helps. Dan Guzman SQL Server MVP [quoted text, click to view] "Michael MacGregor" <macnoknifespam@noemailspam.com> wrote in message news:O0xu9ivGIHA.3848@TK2MSFTNGP05.phx.gbl... >> And note that insert hotspots in modern SQL Server versions are often >> good for performance rather than something to be avoided. > Really? Why? > > MTM >
I thought perhaps you were referring to something new that I did not know about, but thanks anyway for the clarification. Michael MacGregor Database Architect
Don't see what you're looking for? Try a search.
|