sql server (alternate):
Hi All! We are doing new development for SQL Server 2000 and also moving from SQL 7.0 to SQL Server 2000. What are cons and pros for using IDENTITY property as PK in SQL SERVER 2000? Please, share your experience in using IDENTITY as PK . Does SCOPE_IDENTITY makes life easier in SQL 2000? Is there issues with DENTITY property when moving DB from one server to another? (the same version of SQL Server) Thank you in advance,
[quoted text, click to view] Aaron Bertrand [MVP] wrote: >>What are cons and pros for using IDENTITY property as PK in SQL SERVER >>2000? > > > Pros: > - small (4 bytes) > - automatic > - relatively predictable (unlike GUID) > - more usable (try WHERE guidColumn = {AECB...} when debugging a problem) > > Cons: > - meaningless identifier (this can also be a good thing) > - can have gaps (after delete or rollback) > - can't be used in some types of replication > - hotspot for insert if it is also clustered index > - not portable > > We use it here because our natual keys are much larger than 4 bytes, and > this would be inefficient (especially in indexed foreign key constraints).
What does 4 bytes have to do with it? If you had said 60 or 100 I'd understand but why 4? And please consider Joe Celko's voluminous comments on the subject of artificial, or surrogate keys, when responding. Thanks. -- Daniel Morgan http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp damorgan@x.washington.edu (replace 'x' with a 'u' to reply)
[quoted text, click to view] > What are cons and pros for using IDENTITY property as PK in SQL SERVER > 2000?
Pros: - small (4 bytes) - automatic - relatively predictable (unlike GUID) - more usable (try WHERE guidColumn = {AECB...} when debugging a problem) Cons: - meaningless identifier (this can also be a good thing) - can have gaps (after delete or rollback) - can't be used in some types of replication - hotspot for insert if it is also clustered index - not portable We use it here because our natual keys are much larger than 4 bytes, and this would be inefficient (especially in indexed foreign key constraints). [quoted text, click to view] > Does SCOPE_IDENTITY makes life easier in SQL 2000?
Yes, it is more reliable than @@IDENTITY... but I don't know how it would make life easier. [quoted text, click to view] > Is there issues with DENTITY property when moving DB from one server > to another? (the same version of SQL Server)
Depends on how you define "move," and whether this is one-time or continuous. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
[quoted text, click to view] > What does 4 bytes have to do with it? If you had said 60 or 100 I'd > understand but why 4?
Uh, because INT (the most common datatype for IDENTITY) is 4 bytes? [quoted text, click to view] > And please consider Joe Celko's voluminous comments on the subject of > artificial, or surrogate keys, when responding.
Joe's a big boy, and he can speak for himself. So can I. We don't all have to agree on everything. This is why it's called an opinion. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
net__space@hotmail.com (Andy) writes: [quoted text, click to view] > What are cons and pros for using IDENTITY property as PK in SQL SERVER > 2000? Please, share your experience in using IDENTITY as PK .
My experience says the theorists are right about the dangers of an artificial primary key. Many real-world database problems stem from duplicates that would never have been there with a natural primary key. Natural primary keys also result in reports with fewer joins. As for the pros of IDENTITY, if you are going to use an artificial primary key, that's the way to do it. Triggers don't work as well. A pro for artificial keys in general is that Microsoft products make compound primary keys inconvenient. Transact-SQL doesn't have tuple comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient. ASP.NET components that do DataBind() don't handle compound keys at all. AFAICT, you can only set a KeyColumn parameter to a single column. I'd personally recommend going with natural primary keys, even if
[quoted text, click to view] > My experience says the theorists are right about the dangers of an > artificial primary key. Many real-world database problems stem from > duplicates that would never have been there with a natural primary key.
This is only partially true, since what is actually missing is analysis. Tables can have many keys, and should if there are several reasonable natural keys, or in this case artificial keys. [quoted text, click to view] > Natural primary keys also result in reports with fewer joins.
Only when you make the keys very descriptive. This kind of thing has always made for an ugly balance of performance (smaller keys, resulting in values that need 20 characters being condensed into 5) and usability. Joins on very small values are very fast. [quoted text, click to view] > > As for the pros of IDENTITY, if you are going to use an artificial > primary key, that's the way to do it. Triggers don't work as well. A
What do you mean by triggers don't work as well? There are ways to create artificial keys using triggers. [quoted text, click to view] > pro for artificial keys in general is that Microsoft products make > compound primary keys inconvenient. Transact-SQL doesn't have tuple > comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient.
What SQL syntax has this kind of comparison? I would have figured (a,b,c) to be a set, not three different columns. [quoted text, click to view] > I'd personally recommend going with natural primary keys, even if > they're compound.
There is nothing wrong with that statement, whatsoever. Natural keys have value, I just like the consistency of the same pattern being used for all tables. -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :)
[quoted text, click to view] "Bruce Lewis" <brlspam@yahoo.com> wrote in message news:nm97k1gboh8.fsf@scrubbing-bubbles.mit.edu... > net__space@hotmail.com (Andy) writes: > > > What are cons and pros for using IDENTITY property as PK in SQL SERVER > > 2000? Please, share your experience in using IDENTITY as PK . > > My experience says the theorists are right ... .... > I'd personally recommend going with natural primary keys, even if > they're compound.
The theorists disagree with you with respect to compound primary keys -- especially in SQL. Candidate keys obviously have as many attributes as they have, but forming references with compound keys causes severe problems when information may be missing.
[quoted text, click to view] Bob Badour wrote:
<snipped> [quoted text, click to view] > The theorists disagree with you with respect to compound primary keys -- > especially in SQL. Candidate keys obviously have as many attributes as they > have, but forming references with compound keys causes severe problems when > information may be missing.
I disagree. Theorists do not disagree at all with respect to compound primary keys. Primary keys, by definition, don't have missing information. If you are missing information you have something but that something is not, by definition, a primary key. -- Daniel Morgan http://www.outreach.washington.edu/ext/certificates/oad/oad_crs.asp http://www.outreach.washington.edu/ext/certificates/aoa/aoa_crs.asp damorgan@x.washington.edu (replace 'x' with a 'u' to reply)
[quoted text, click to view] "Daniel Morgan" <damorgan@x.washington.edu> wrote in message news:1070326393.444066@yasure... > Bob Badour wrote: > > <snipped> > > > The theorists disagree with you with respect to compound primary keys -- > > especially in SQL. Candidate keys obviously have as many attributes as they > > have, but forming references with compound keys causes severe problems when > > information may be missing. > > I disagree. Theorists do not disagree at all with respect to compound > primary keys. Primary keys, by definition, don't have missing information. > > If you are missing information you have something but that something is > not, by definition, a primary key.
Since when does that have any bearing on missing data in the referencing table?
[quoted text, click to view] > > comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient. > > What SQL syntax has this kind of comparison? I would have figured (a,b,c) > to be a set, not three different columns.
Row-value comparisons such as this are standard in SQL92 but unfortunately not supported by SQLServer. Oracle, I believe, does support this feature. -- David Portas ------------ Please reply only to the newsgroup --
While it is true that the chosen primary key cannot contain any optional values, it is more the praticioner (sp?) that disagrees with this stance. Compound keys are unwieldy and bad for performance, but the theorist in me says "who cares?" It is all about what is right/best, not what is fastest/easiest. -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :) [quoted text, click to view] "Bob Badour" <bbadour@golden.net> wrote in message news:pfqdnbyWFqc951aiRVn-hA@golden.net... > > "Bruce Lewis" <brlspam@yahoo.com> wrote in message > news:nm97k1gboh8.fsf@scrubbing-bubbles.mit.edu... > > net__space@hotmail.com (Andy) writes: > > > > > What are cons and pros for using IDENTITY property as PK in SQL SERVER > > > 2000? Please, share your experience in using IDENTITY as PK . > > > > My experience says the theorists are right ... > ... > > I'd personally recommend going with natural primary keys, even if > > they're compound. > > The theorists disagree with you with respect to compound primary keys -- > especially in SQL. Candidate keys obviously have as many attributes as they > have, but forming references with compound keys causes severe problems when > information may be missing. > >
[quoted text, click to view] "Bruce Lewis" <brlspam@yahoo.com> wrote in message news:nm97k1gboh8.fsf@scrubbing-bubbles.mit.edu... > net__space@hotmail.com (Andy) writes: > > > What are cons and pros for using IDENTITY property as PK in SQL SERVER > > 2000? Please, share your experience in using IDENTITY as PK . > > My experience says the theorists are right about the dangers of an > artificial primary key. Many real-world database problems stem from > duplicates that would never have been there with a natural primary key. > Natural primary keys also result in reports with fewer joins. > > As for the pros of IDENTITY, if you are going to use an artificial > primary key, that's the way to do it. Triggers don't work as well. A > pro for artificial keys in general is that Microsoft products make > compound primary keys inconvenient. Transact-SQL doesn't have tuple > comparisons, e.g. (a, b, c) = (x, y, z), making joins inconvenient. > ASP.NET components that do DataBind() don't handle compound keys at all. > AFAICT, you can only set a KeyColumn parameter to a single column. > > I'd personally recommend going with natural primary keys, even if > they're compound.
I am by no means a SQL expert, so forgive me if this seems ignorant...But why can't the ID columm be a natural key? For example, I am working on a project that has a vendors table. The list of vendors is used in relationship to several other tables. We build this table with an identity column, and a column with the vendor's name. Now when a vendor is added to the table, they are assigned a unique ID that ties all other related data back to this vendor, and in the case of a vendor changing their name, or a typo, we can make updates without affecting data. Seems like a perfect use for an identity field, and it is our primary key. -- BV. WebPorgmaster - www.IHeartMyPond.com Work at Home, Save the Environment - www.amothersdream.com
[quoted text, click to view] "Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in message news:#rIbinJuDHA.3436@tk2msftngp13.phx.gbl... > While it is true that the chosen primary key cannot contain any optional > values, it is more the praticioner (sp?) that disagrees with this stance. > Compound keys are unwieldy and bad for performance, but the theorist in me > says "who cares?" It is all about what is right/best, not what is > fastest/easiest.
Performance is determined by the physical structure and not by the logical interface. Legitimate theorists have written ad nauseum on the severe logical problems caused by using compound keys for references when data may be missing. [quoted text, click to view] > -------------------------------------------------------------------------- -- > ----------- > Louis Davidson (drsql@hotmail.com) > Compass Technology Management > > Pro SQL Server 2000 Database Design > http://www.apress.com/book/bookDisplay.html?bID=266 > > Note: Please reply to the newsgroups only unless you are > interested in consulting services. All other replies will be ignored :) > > "Bob Badour" <bbadour@golden.net> wrote in message > news:pfqdnbyWFqc951aiRVn-hA@golden.net... > > > > "Bruce Lewis" <brlspam@yahoo.com> wrote in message > > news:nm97k1gboh8.fsf@scrubbing-bubbles.mit.edu... > > > net__space@hotmail.com (Andy) writes: > > > > > > > What are cons and pros for using IDENTITY property as PK in SQL SERVER > > > > 2000? Please, share your experience in using IDENTITY as PK . > > > > > > My experience says the theorists are right ... > > ... > > > I'd personally recommend going with natural primary keys, even if > > > they're compound. > > > > The theorists disagree with you with respect to compound primary keys -- > > especially in SQL. Candidate keys obviously have as many attributes as > they > > have, but forming references with compound keys causes severe problems > when > > information may be missing. > > > > > >
[quoted text, click to view] > "who cares?" It is all about what is right/best, not what is > fastest/easiest.
Unfortunately, what is right/best is not always among the criteria when the work is for someone else. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
[quoted text, click to view] > I am by no means a SQL expert, so forgive me if this seems ignorant...But > why can't the ID columm be a natural key? For example, I am working on a > project that has a vendors table. The list of vendors is used in > relationship to several other tables. We build this table with an identity
An identity value that is generated by the system is not "natural"... a natural key means that the key is, by nature, identifying a single row... not artificially because you generated some value for it. A natural key could be an e-mail address, or a social security number, or a license plate number, or a latitude and longitude -- something that is part of the data that also happens to uniquely identify it. Keep in mind that a primary key does not have to a natural key. I don't see any problems with using an IDENTITY as a primary key, such as in your case. But in your case it is not a natural key. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
[quoted text, click to view] Aaron Bertrand - MVP wrote: >>I am by no means a SQL expert, so forgive me if this seems ignorant...But >>why can't the ID columm be a natural key? For example, I am working on a >>project that has a vendors table. The list of vendors is used in >>relationship to several other tables. We build this table with an identity >> >> > >An identity value that is generated by the system is not "natural"... a >natural key means that the key is, by nature, identifying a single row... >not artificially because you generated some value for it. A natural key >could be an e-mail address, or a social security number, or a license plate >number, or a latitude and longitude -- something that is part of the data >that also happens to uniquely identify it. >
Quiz: Classify each key below as "natural" or "artificial": Northwind..Customers.CustomerID Northwind..Orders.OrderID Northwind..Territories.TerritoryID Most keys are fundamentally artificial, but somehow we only call them artificial if we made them up, not if someone else made them up. Social Security numbers are probably nothing different than identity values in someone elses database, which doesn't make them any more intrinsic to living people than VendorID values. I don't think an identity VendorID value is any more artificial than any other unique way of identifying a vendor, so long as it is assigned the first time a vendor enters the system and is never changed. I think this whole natural/artifical distinction is mostly quite silly. In a well-designed database, entities can just as easily be identified by an integer they are assigned when they enter the system as they can by some set of attributes within the data they enter the system with. The arguments against using identity values always seem to be arguments against the ways people misuse identity values. The advantage of keys like Northwind..Customers.CustomerID is really an error-correction issue. If carefully chosen, those 5-character keys can be recovered if a single letter is mistyped. But this can be done with artificial values also, if check digits or other error-correcting schemes are used. SK [quoted text, click to view] > >Keep in mind that a primary key does not have to a natural key. I don't see >any problems with using an IDENTITY as a primary key, such as in your case. >But in your case it is not a natural key. > > >
[quoted text, click to view] > artificial if we made them up, not if someone else made them up. Social > Security numbers are probably nothing different than identity values in > someone elses database,
Right, but this is a centralized and controlled database. Nobody else has my SSN (though there are exceptions, e.g. someday SSNs for deceased people will have to be re-used); anybody else who uses it to identify themselves is likely attempting fraud / identity theft. I consider it a "natural" key because I supply it to the database, rather than the other way around. Whereas my customerID according to Barnes & Noble is very unlikely to be the same as my customerID at J.Crew. In any case, I do agree that the distinction is largely silly, especially when it erupts into arguments and "but Celko says..." nonsense. Like many other things in the database world, the choice of a key is not dictated by some higher power, but is rather situation-dependent. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
[quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:e-qdnT5gCqKOTFGiRVn-iQ@giganews.com... > > living people than VendorID values. I don't think an identity VendorID > > value is any more artificial than any other unique way of identifying a > > vendor, so long as it is assigned the first time a vendor enters the > > system and is never changed. > > But the point is that a "natural" key is verifiable outside of the system.
As soon as one records the generated identity value outside the system, the key is verifiable outside of the system. All keys are surrogates or artificial keys. Natural keys are nothing more than familiar surrogates.
The problem is with how it is used. If you start giving users access to identity based values, you get into a bad spot where they want to make changes to the value (in my line of work, we don't like the numbers 666 in account numbers) so using identities for user values is a bad idea. I use them only for internal pointers that are never presented to users, since they are not modifiable. I could use guids, or characters, or whatever for keys and no one would be the wiser. [quoted text, click to view] > Natural keys are nothing more than familiar surrogates.
You are kind of right here, but it is generally true that natural keys can change, because in the world, things can change. Identities cannot change. -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :) [quoted text, click to view] "Bob Badour" <bbadour@golden.net> wrote in message news:IOydndJOXv5aR1GiRVn-hg@golden.net... > "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message > news:e-qdnT5gCqKOTFGiRVn-iQ@giganews.com... > > > living people than VendorID values. I don't think an identity VendorID > > > value is any more artificial than any other unique way of identifying a > > > vendor, so long as it is assigned the first time a vendor enters the > > > system and is never changed. > > > > But the point is that a "natural" key is verifiable outside of the system. > > As soon as one records the generated identity value outside the system, the > key is verifiable outside of the system. All keys are surrogates or > artificial keys. Natural keys are nothing more than familiar surrogates. > >
[quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:e-qdnT5gCqKOTFGiRVn-iQ@giganews.com... > > living people than VendorID values. I don't think an identity VendorID > > value is any more artificial than any other unique way of identifying a > > vendor, so long as it is assigned the first time a vendor enters the > > system and is never changed. > > But the point is that a "natural" key is verifiable outside of the system. > > When I see a NG post from "skass[at]drew.edu" I don't care whether that's > based on your "real" name or even whether S.Kass is the same name as on your > passport or driver's licence. What's important to me is that it's determined > by a consistent method outside of the system which gives me some acceptable > degree of confidence that you're the same person who posted here as > "skass[at]drew.edu" yesterday. Of course that validity is destroyed if you > change your email address or if someone spoofs your address. But it's still > intrinsically better than an arbitrary ID allocated by the server. >
I am a firm believer that natural keys should only be used to logically design/normalize the data. When it comes to the real reason for keys, data integrity, more often than not I have seen that natural keys are intrinsically not good physical primary keys. 1. Natural keys are, being natural and therefore user entered [i.e., provided to the database by external means], fungible. If a user enters data, they must also be able to modify it. If data can be modified, then its value as a systemic primary key is gone. Yes, you can cascade updates to these, but why do it when it can be avoided to start with. 2. Natural keys are typically a composite of atomic attributes. If using a composite, these must be propagated to referencing tables as foreign keys. Your normalization drops below par, by having these [potentially] massively duplicated columns. Attributes that are single, [supposedly] unique attributes (e.g., SSN), usually represent some official, governmentally recognized ID, and therefore have legal issues with being propagated throughout a system. Also, for amateurs and many professionals, natural keys are very often chosen incorrectly. e.g., I believe some combination of Name and other info has been used by my ISP as their primary key. My last name was entered into their system incorrectly, but they cannot fix it because their system will not allow it. Preposterous and poor design. Surrogate keys generated by using the identity property are ideal for data integrity, because 1. They are static values [i.e., once entered, it does not change] and the DBA has control over allowing values in identity columns to be modified. 2. They are singleton row ids. The fact that they are sequential is irrelevant. That is simply the most efficient means of generating new numeric values. Identity integers can be problematic in two-way replication, but proper management of key ranges can alleviate these issues. GUIDs are the MS recommended way to deal with distributed data and two-way replication, but not as easy to deal with in unreplicated databases.
[quoted text, click to view] David Portas wrote: >>living people than VendorID values. I don't think an identity VendorID >>value is any more artificial than any other unique way of identifying a >>vendor, so long as it is assigned the first time a vendor enters the >>system and is never changed. >> >> > >But the point is that a "natural" key is verifiable outside of the system. >
I guess it depends on where you draw the boundaries of "the system." If you need something outside of "the system" to verify your key, aren't you just working within a larger system (and one that is not entirely under your control)? [quoted text, click to view] > >When I see a NG post from "skass[at]drew.edu" I don't care whether that's >based on your "real" name or even whether S.Kass is the same name as on your >passport or driver's licence. What's important to me is that it's determined >by a consistent method outside of the system which gives me some acceptable >degree of confidence that you're the same person who posted here as >"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you >change your email address or if someone spoofs your address. But it's still >intrinsically better than an arbitrary ID allocated by the server. >
This is a good point. If entities enter your system from time to time, and you must determine whether they duplicate entities already in your system or are new, then you need some "natural" method of identification. If I apply for a Microsoft credit card, Microsoft will likely generate an artifical credit card number for me. But that can't be the only way in which Microsoft can identify me if a business rule prohibits one person from possessing two Microsoft credit cards A nice way to look at this is by recognizing when there is and when there isn't some external entity that participates in an internal business rule. The need to use a "natural" key such as my social security number exists only because there is an entity outside Microsoft's credit department (the person-registered-with-the-social-security-administration entity) that participates in a business rule: that the cardinality of the PRWTSSA <-> CCP relation is required to be 1<->{0,1}. This is a useful way to look at things. Is an identity value VendorID a good key for a vendor? It depends on whether there is some entity "outside the system" with a certain relationship to the entity identified by VendorID. If there is, then VendorID is not suitable as the only key. If it is used, it must be a surrogate for a natural external key also recorded within the system. Some businesses may allow one corporation more than one VendorID, and others may not. Is an identity value InvoiceID a good key for invoices (in the database of the business generating the invoices)? It might well be, since invoices can be internal to the system. While there might be a more "natural" key, such as (CreationDatetime, IssuingEmployee_or_System), the natural key might not provide any added value beyond its individual attribute values if invoices are entirely internal. In some cases, what appear to be external entities can use internal (artificial) keys, because there is no business rule relating the internal and external entities. An example might be a deli counter customer who pulls an identity value from the "take a ticket" machine. Here the only confusion is that there is no handy word for the human-visit-to-deli-counter entity, and Customer might be a more convenient name. Despite the fact that customers are people and people can be identified uniquely outside the system, there's no need to worry about that to manage visits to the deli counter. I'm curious now to know whether this point of view helps make a little more sense of the big debate. If not that, maybe it will at least help me understand why I get anxious every time I need to use a Microsoft Passport for identification. SK [quoted text, click to view] > > >
[quoted text, click to view] Trevor Best wrote: >On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas" ><REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote: > > > >>>living people than VendorID values. I don't think an identity VendorID >>>value is any more artificial than any other unique way of identifying a >>>vendor, so long as it is assigned the first time a vendor enters the >>>system and is never changed. >>> >>> >>But the point is that a "natural" key is verifiable outside of the system. >> >>When I see a NG post from "skass[at]drew.edu" I don't care whether that's >>based on your "real" name or even whether S.Kass is the same name as on your >>passport or driver's licence. What's important to me is that it's determined >>by a consistent method outside of the system which gives me some acceptable >>degree of confidence that you're the same person who posted here as >>"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you >>change your email address or if someone spoofs your address. But it's still >>intrinsically better than an arbitrary ID allocated by the server. >> >> > >If Steve leaves that educational facility he's at now, his email >address will surely change. If he goes into a witness protection >scheme his name, address and SS number will change, he or someone else >could change quiet a bit about him but if he's on a database with an >identity column as his PK then it's more likely that it will *never* >change. >
In fact none of the 257 different identity values assigned to me will ever get changed - unless I'm organized and honest, and unless someone checks some kind of natural key of mine, how will anyone know it was me each of those 257 times I opened an account? SK [quoted text, click to view] > > >
IDENTITY may be a *surrogate* key but it isn't a *natural* key because it bears no relation to the entity that you are modelling in your table. A natural primary key is a subset of the attributes of an entity which uniquely identify that entity. IDENTITY clearly isn't an attribute of any real entity - it's just an arbitrary number. Taking your Vendors table as an example, a naive design might look like this: CREATE TABLE Vendors (vendor_id INTEGER IDENTITY PRIMARY KEY /* ?? */, vendor_name VARCHAR(40) NOT NULL, vendor_tax_id VARCHAR(10) NOT NULL, ...) But this table has no uniqueness or integrity because multiple vendors can exist with different (arbitrary) vendor_ids. It may well work internally for a particular application but will break when someone needs to do some real analysis on your data. And what if you need to combine it with data from another system that doesn't have that same magical Vendor_id column? Here's a better alternative: CREATE TABLE Vendors (vendor_id INTEGER NOT NULL UNIQUE /* surrogate */, vendor_name VARCHAR(40) NOT NULL UNIQUE, vendor_tax_id VARCHAR(10) NOT NULL PRIMARY KEY, ...) By declaring UNIQUE / PK constraints on the correct attributes you can ensure that you have verifiably unique data. Keep the surrogate key if you like but make sure you declare the Natural key as well. (Moving the actual PK declaration is essentially cosmetic - PK is equivalent to NOT NULL UNIQUE and it's not unusual to have several NOT NULL UNIQUE keys in a table). Here's Celko on keys: http://www.intelligententerprise.com/030320/605celko1_1.shtml -- David Portas ------------ Please reply only to the newsgroup --
[quoted text, click to view] > And what happens if someone does a DBCC checkident ('FOO', RESEED)? > > Or you have to copy it into a new table and accidently set that table up > with an IDENTITY column and now all your rows get new IDs?
Fire the DBA. If they've allowed unskilled people such access or do anything accidentally, no telling what other problems they'll cause ;)
[quoted text, click to view] > living people than VendorID values. I don't think an identity VendorID > value is any more artificial than any other unique way of identifying a > vendor, so long as it is assigned the first time a vendor enters the > system and is never changed.
But the point is that a "natural" key is verifiable outside of the system. When I see a NG post from "skass[at]drew.edu" I don't care whether that's based on your "real" name or even whether S.Kass is the same name as on your passport or driver's licence. What's important to me is that it's determined by a consistent method outside of the system which gives me some acceptable degree of confidence that you're the same person who posted here as "skass[at]drew.edu" yesterday. Of course that validity is destroyed if you change your email address or if someone spoofs your address. But it's still intrinsically better than an arbitrary ID allocated by the server. -- David Portas ------------ Please reply only to the newsgroup --
[quoted text, click to view] "Greg D. Moore (Strider)" <mooregr@greenms.com> wrote in message news:ILazb.162127$ji3.79874@twister.nyroc.rr.com... > > "Trey Walpole" <treyNOpole@SPcomcastAM.net> wrote in message > news:uPV5k3SuDHA.1196@TK2MSFTNGP12.phx.gbl... > > > And what happens if someone does a DBCC checkident ('FOO', RESEED)? > > > > > > Or you have to copy it into a new table and accidently set that table up > > > with an IDENTITY column and now all your rows get new IDs? > > > > Fire the DBA. If they've allowed unskilled people such access or do > anything > > accidentally, no telling what other problems they'll cause ;) > > Wow. Can I get a job where you work where folks never make mistakes? > > Seriously, a DBCC checkident can be necessary in some recovery scenarios. > > Copying a table over into another one is often necessary in general > maintenance, schema changes, etc. It's pretty easy to forget to do it > right. >
I was, of course, being facetious -- well, mostly. :*) Mistakes do happen, but in most situations in db management, like these mentioned, they are completely avoidable. The DBA should be the one doing the DBCC CHECKIDENT - not just "someone". And if it is an approveed someone else, the DBA better know about it. Copying a table's data over is often necessary, but you do need to be very careful and know all the things that might be affected. Again, since this is a DBA responsibility, he'd better know how to do it right and have some test db to work with. [And if that scenario happens, it's pretty easy to fix, although it does mean moving data all over again.] [quoted text, click to view] > Admittedly, they are pretty contrived examples, but the point is, the value > of the identity relies on some arbitrary state of the DB at the point in > time it is created. > > Now, in some cases that just might not matter, but in many cases it can be > an important factor. >
Also very true. But it is the DBA that has [or should have] the control over any modifications that affect identity values, whereas any user has the ability to change natural keys [which was the point I was making :)]
On Tue, 2 Dec 2003 09:04:35 -0500 in comp.databases, "Bob Badour" [quoted text, click to view] <bbadour@golden.net> wrote: > >"Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in message >news:#rIbinJuDHA.3436@tk2msftngp13.phx.gbl... >> While it is true that the chosen primary key cannot contain any optional >> values, it is more the praticioner (sp?) that disagrees with this stance. >> Compound keys are unwieldy and bad for performance, but the theorist in me >> says "who cares?" It is all about what is right/best, not what is >> fastest/easiest. > >Performance is determined by the physical structure and not by the logical >interface. Legitimate theorists have written ad nauseum on the severe >logical problems caused by using compound keys for references when data may >be missing.
Are you talking about data being missing from a foreign key side of a relationship? I'm pro identity column myself but I don't see how that would help in this instance. --
On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas" [quoted text, click to view] <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote: >> living people than VendorID values. I don't think an identity VendorID >> value is any more artificial than any other unique way of identifying a >> vendor, so long as it is assigned the first time a vendor enters the >> system and is never changed. > >But the point is that a "natural" key is verifiable outside of the system. > >When I see a NG post from "skass[at]drew.edu" I don't care whether that's >based on your "real" name or even whether S.Kass is the same name as on your >passport or driver's licence. What's important to me is that it's determined >by a consistent method outside of the system which gives me some acceptable >degree of confidence that you're the same person who posted here as >"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you >change your email address or if someone spoofs your address. But it's still >intrinsically better than an arbitrary ID allocated by the server.
If Steve leaves that educational facility he's at now, his email address will surely change. If he goes into a witness protection scheme his name, address and SS number will change, he or someone else could change quiet a bit about him but if he's on a database with an identity column as his PK then it's more likely that it will *never* change. --
[quoted text, click to view] "Steve Kass" <skass@drew.edu> wrote in message news:3FCCF5D9.9070405@drew.edu... > In fact none of the 257 different identity values assigned to me will ever > get changed - unless I'm organized and honest, and unless someone > checks some kind of natural key of mine, how will anyone know > it was me each of those 257 times I opened an account?
Those values were not assigned to you, they were assigned to 257 different accounts. Those 257 accounts will also have 257 natural keys most likely, which will be an account number, likely taken from a paper form in the bank office. If you are talking about you giving 257 different identities (you tell them you are different people) then the fake people are getting new identity values, and pretty soon they will get a new artificial key printed on a fashionable orange jumpsuit :) -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :)
You have definitely hit a couple of the strong points in IDENTITY's favor. But there are cases where a "natural" kind of key can be small as well. Consider when eBay bought PayPal (and other than that, the rest of this paragraph is completely fictional), they likely had to merge some data... perhaps eBay used an IDENTITY to generate customer numbers, but they want to align those primary keys with the new data in the PayPal tables. So, the keys in the PayPal data become INTs, but not IDENTITY. They are kind of "natural" because they came to the PayPal from an external source, so to speak, rather than generated arbitrarily from within. Of course, completely fictional. But surely you can see that not all natural keys are going to be larger than an IDENTITY, or less efficient. There are other examples, too. In a small stats system, a SMALLDATETIME could be the primary key (perhaps several subrelated tables are organized by day). In fact, part of http://www.aspfaq.com/stats.asp (and plenty more that you can't see) is derived on a set of tables where SMALLDATETIME is the only key of relevance. Okay, so that's still 4 bytes, but you save 4 if your other alternative is to store an IDENTITY along with the SMALLDATETIME value. Consider: CREATE TABLE calendar ( dateValue SMALLDATETIME PRIMARY KEY ) vs. CREATE TABLE calendar ( dateID INT IDENTITY PRIMARY KEY, dateValue SMALLDATETIME NOT NULL ) Never mind my goofy naming scheme. :-) Now, Kass could probably show me some cool dateadd tricks that would allow me to store just an INT (or maybe even a SMALLINT, depending on the date range required), and determine what the date value is at runtime. Not that I think that's what his argument would be, but rather just to show that it is still possible to choose either route. I think the usability of the date value representing what it is, rather than having to derive its value from some formula, is a good thing. In cases like e-mail address and SSN (and in fact most cases), I still prefer your route, where there is a surrogate key (IDENTITY) that prevents me from having to cascade changes all over the place, and store larger foreign keys. Firstname + lastname is obviously a bad choice for a key of any kind, because I know more than one Aaron Bertrand. So then you bring middle name into the key, and it can still be repeated. Other things like getting adopted, re-married, legally changing their name, and other reasons why this "key" would change are minor; changes to the key can be dealt with in the database using DRI/CASCADE or, worst case scenario, through rigorous update code; it will be tougher to re-train users to look up all the tables containing "Carmen Bertrand" instead of "Carmen Electra." :-) However, I think the possibility of two people having the same key is a far more compelling argument for bypassing the natural key and placing some meaningless identifier, like IDENTITY, that the user doesn't care about and would never have to change. Now, you might think, "why not bring SSN into the FirstName + MiddleName + LastName key? That would make it unique." Yes, and hideously large. If SSN is unique, then why not just use SSN as the key? Again, it's large even on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I fail to see the benefit of repeating the value in every related table, DRI or not. Sorry about the earful, sometimes I get a little typographical diarrhea. Hopefully that was at least marginally intelligible. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/ [quoted text, click to view] "Stijn Verrept" <sverrept@nospan.vub.ac.be> wrote in message news:#G5HW$TuDHA.1512@TK2MSFTNGP10.phx.gbl... > I've read through this thread but I don't understand it. I always use an > int or smallint as primary key, with identity. I believe it would be a mess > otherwise. > > Example: I have a table with people, last name, first name, address, ... So > suppose you would make a natural key then you need at least the last name > and the first name. I have >25 other tables that reference that table. If > I get this right I will need to use the name and firstname field in all the > other tables as well to reference. Isn't that just a lot of data waste? If > I'm missing something, please tell me what because this seems a bit silly. > > > Stijn Verrept. > >
[quoted text, click to view] > Actually it's impossible to fix if you've deleted the original table since > you no longer have the original ID numbers.
Hopefully you're not starting a DTS task, never mind dropping a table before validating a successful transfer, without a decent backup in place. I think this is the kind of thing that Trey means when he uses the term "avoidable"... I'll stretch it here to also mean "correctable."
[quoted text, click to view] "BenignVanilla" <bv@tibetanbeefgarden.com> wrote in message news:O5OdnVtG1uWtN1GiRVn-hg@giganews.com... > > I am by no means a SQL expert, so forgive me if this seems ignorant...But > why can't the ID columm be a natural key? For example, I am working on a > project that has a vendors table. The list of vendors is used in > relationship to several other tables. We build this table with an identity > column, and a column with the vendor's name. Now when a vendor is added to > the table, they are assigned a unique ID that ties all other related data > back to this vendor, and in the case of a vendor changing their name, or a > typo, we can make updates without affecting data. Seems like a perfect use > for an identity field, and it is our primary key.
And what happens if someone does a DBCC checkident ('FOO', RESEED)? Or you have to copy it into a new table and accidently set that table up with an IDENTITY column and now all your rows get new IDs? [quoted text, click to view] > > > -- > BV. > WebPorgmaster - www.IHeartMyPond.com > Work at Home, Save the Environment - www.amothersdream.com > >
[quoted text, click to view] "Trey Walpole" <treyNOpole@SPcomcastAM.net> wrote in message news:uPV5k3SuDHA.1196@TK2MSFTNGP12.phx.gbl... > > And what happens if someone does a DBCC checkident ('FOO', RESEED)? > > > > Or you have to copy it into a new table and accidently set that table up > > with an IDENTITY column and now all your rows get new IDs? > > Fire the DBA. If they've allowed unskilled people such access or do anything > accidentally, no telling what other problems they'll cause ;)
Wow. Can I get a job where you work where folks never make mistakes? Seriously, a DBCC checkident can be necessary in some recovery scenarios. Copying a table over into another one is often necessary in general maintenance, schema changes, etc. It's pretty easy to forget to do it right. Admittedly, they are pretty contrived examples, but the point is, the value of the identity relies on some arbitrary state of the DB at the point in time it is created. Now, in some cases that just might not matter, but in many cases it can be an important factor. [quoted text, click to view] > >
[quoted text, click to view] "Trey Walpole" <treyNOpole@SPcomcastAM.net> wrote in message news:uNtoqBUuDHA.2316@TK2MSFTNGP10.phx.gbl... > "Greg D. Moore (Strider)" <mooregr@greenms.com> wrote in message > news:ILazb.162127$ji3.79874@twister.nyroc.rr.com... > > > > I was, of course, being facetious -- well, mostly. :*) Mistakes do happen, > but in most situations in db management, like these mentioned, they are > completely avoidable.
Oh I know. :-) [quoted text, click to view] > > The DBA should be the one doing the DBCC CHECKIDENT - not just "someone". > And if it is an approveed someone else, the DBA better know about it.
Keep in mind not all companies have that level of experience. I've consulted for a few. [quoted text, click to view] > > Copying a table's data over is often necessary, but you do need to be very > careful and know all the things that might be affected. Again, since this is > a DBA responsibility, he'd better know how to do it right and have some test > db to work with. > [And if that scenario happens, it's pretty easy to fix, although it does > mean moving data all over again.]
Actually it's impossible to fix if you've deleted the original table since you no longer have the original ID numbers. [quoted text, click to view] > > > Admittedly, they are pretty contrived examples, but the point is, the > value > > of the identity relies on some arbitrary state of the DB at the point in > > time it is created. > > > > Now, in some cases that just might not matter, but in many cases it can be > > an important factor. > > > > Also very true. But it is the DBA that has [or should have] the control over > any modifications that affect identity values, whereas any user has the > ability to change natural keys [which was the point I was making :)] >
In an ideal situation, I'd agree. [quoted text, click to view] >
I've read through this thread but I don't understand it. I always use an int or smallint as primary key, with identity. I believe it would be a mess otherwise. Example: I have a table with people, last name, first name, address, ... So suppose you would make a natural key then you need at least the last name and the first name. I have >25 other tables that reference that table. If I get this right I will need to use the name and firstname field in all the other tables as well to reference. Isn't that just a lot of data waste? If I'm missing something, please tell me what because this seems a bit silly. Stijn Verrept.
[quoted text, click to view] > sometimes :). After reading through this thread I got the impression that > natural keys are used more than surrogate keys, while in practice I use > identity with surrogate keys almost all the time.
I don't get that impression at all, and I don't believe it is true. Maybe that's what purists would *like* however... -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
[quoted text, click to view] "Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in message news:ex84CxQuDHA.1788@tk2msftngp13.phx.gbl... > The problem is with how it is used. If you start giving users access to > identity based values, you get into a bad spot where they want to make > changes to the value (in my line of work, we don't like the numbers 666 in > account numbers) so using identities for user values is a bad idea. I use > them only for internal pointers that are never presented to users, since > they are not modifiable. I could use guids, or characters, or whatever for > keys and no one would be the wiser.
Keys are logical identifiers. They identify data for the user as well as for the dbms. Preventing the user from seeing the identifier is just stupid. [quoted text, click to view] > > Natural keys are nothing more than familiar surrogates. > > You are kind of right here, but it is generally true that natural keys can > change, because in the world, things can change. Identities cannot
change. It is not a "kind of" rightness. The statement is obviously and self-evidently right to anyone with at least a minimal education in the fundamentals of data management. [quoted text, click to view] > -- > -------------------------------------------------------------------------- -- > ----------- > Louis Davidson (drsql@hotmail.com) > Compass Technology Management > > Pro SQL Server 2000 Database Design > http://www.apress.com/book/bookDisplay.html?bID=266 > > Note: Please reply to the newsgroups only unless you are > interested in consulting services. All other replies will be ignored :) > > "Bob Badour" <bbadour@golden.net> wrote in message > news:IOydndJOXv5aR1GiRVn-hg@golden.net... > > "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message > > news:e-qdnT5gCqKOTFGiRVn-iQ@giganews.com... > > > > living people than VendorID values. I don't think an identity > VendorID > > > > value is any more artificial than any other unique way of identifying > a > > > > vendor, so long as it is assigned the first time a vendor enters the > > > > system and is never changed. > > > > > > But the point is that a "natural" key is verifiable outside of the > system. > > > > As soon as one records the generated identity value outside the system, > the > > key is verifiable outside of the system. All keys are surrogates or > > artificial keys. Natural keys are nothing more than familiar surrogates. > > > > > >
[quoted text, click to view] "Trey Walpole" <treyNOpole@SPcomcastAM.net> wrote in message news:#Gwam6QuDHA.2248@TK2MSFTNGP09.phx.gbl... > > "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message > news:e-qdnT5gCqKOTFGiRVn-iQ@giganews.com... > > > living people than VendorID values. I don't think an identity VendorID > > > value is any more artificial than any other unique way of identifying a > > > vendor, so long as it is assigned the first time a vendor enters the > > > system and is never changed. > > > > But the point is that a "natural" key is verifiable outside of the system. > > > > When I see a NG post from "skass[at]drew.edu" I don't care whether that's > > based on your "real" name or even whether S.Kass is the same name as on > your > > passport or driver's licence. What's important to me is that it's > determined > > by a consistent method outside of the system which gives me some > acceptable > > degree of confidence that you're the same person who posted here as > > "skass[at]drew.edu" yesterday. Of course that validity is destroyed if you > > change your email address or if someone spoofs your address. But it's > still > > intrinsically better than an arbitrary ID allocated by the server. > > > > I am a firm believer that natural keys should only be used to logically > design/normalize the data. When it comes to the real reason for keys, data > integrity, more often than not I have seen that natural keys are > intrinsically not good physical primary keys.
You are evidently ignorant of the purpose of a key. A candidate key is a logical identifier that identifies data. It is not a physical identifier. A dbms can map the logical identifier to any physical identifier using any method without altering the logical data model. I suggest an elementary education in data management is in order before you pontificate on the subject. Spreading your ignorance will only confuse the unwary novice. Shame on you.
[quoted text, click to view] "Trey Walpole" <treyNOpole@SPcomcastAM.net> wrote in message news:uPV5k3SuDHA.1196@TK2MSFTNGP12.phx.gbl... > > And what happens if someone does a DBCC checkident ('FOO', RESEED)? > > > > Or you have to copy it into a new table and accidently set that table up > > with an IDENTITY column and now all your rows get new IDs? > > Fire the DBA. If they've allowed unskilled people such access or do anything > accidentally, no telling what other problems they'll cause ;)
Pray your employer never employs the same standard.
[quoted text, click to view] "Trevor Best" <bouncer@localhost> wrote in message news:1rrpsvc8ibqr1fu8n475e59hu2748ipvb4@4ax.com... > On Tue, 2 Dec 2003 09:04:35 -0500 in comp.databases, "Bob Badour" > <bbadour@golden.net> wrote: > > > > >"Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in message > >news:#rIbinJuDHA.3436@tk2msftngp13.phx.gbl... > >> While it is true that the chosen primary key cannot contain any optional > >> values, it is more the praticioner (sp?) that disagrees with this stance. > >> Compound keys are unwieldy and bad for performance, but the theorist in me > >> says "who cares?" It is all about what is right/best, not what is > >> fastest/easiest. > > > >Performance is determined by the physical structure and not by the logical > >interface. Legitimate theorists have written ad nauseum on the severe > >logical problems caused by using compound keys for references when data may > >be missing. > > Are you talking about data being missing from a foreign key side of a > relationship? I'm pro identity column myself but I don't see how that > would help in this instance.
The specific issue is compound keys and missing information. A simple key does not exhibit the same problems regardless whether it is an identity column. Consider a compound key with attributes A and B. What happens when the user inserts a referencing row with a known A and an unknown B? Should the dbms allow the insert? When should it allow the insert? Should the dbms verify the A exists at least once in the referenced table? If the A value exists only once in the referenced table, should the dbms substitute the only corresponding B value that could be correct in the inserted row? Suppose the user deletes all the rows from the referenced table that contain a specific A value. What happens when the referencing table contains corresponding rows with a known A and an unknown B?
[quoted text, click to view] "Trevor Best" <bouncer@localhost> wrote in message news:u8spsvc8s0hfc741blgblj7rq62u8nsp1k@4ax.com... > On Tue, 2 Dec 2003 17:53:44 -0000 in comp.databases, "David Portas" > <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote: > > >> living people than VendorID values. I don't think an identity VendorID > >> value is any more artificial than any other unique way of identifying a > >> vendor, so long as it is assigned the first time a vendor enters the > >> system and is never changed. > > > >But the point is that a "natural" key is verifiable outside of the system. > > > >When I see a NG post from "skass[at]drew.edu" I don't care whether that's > >based on your "real" name or even whether S.Kass is the same name as on your > >passport or driver's licence. What's important to me is that it's determined > >by a consistent method outside of the system which gives me some acceptable > >degree of confidence that you're the same person who posted here as > >"skass[at]drew.edu" yesterday. Of course that validity is destroyed if you > >change your email address or if someone spoofs your address. But it's still > >intrinsically better than an arbitrary ID allocated by the server. > > If Steve leaves that educational facility he's at now, his email > address will surely change. If he goes into a witness protection > scheme his name, address and SS number will change, he or someone else > could change quiet a bit about him but if he's on a database with an > identity column as his PK then it's more likely that it will *never* > change.
With all due respect, the whole point of the witness protection programme is to prevent people from associating the individual with their previous identity.
[quoted text, click to view] "Stijn Verrept" <sverrept@nospan.vub.ac.be> wrote in message news:#G5HW$TuDHA.1512@TK2MSFTNGP10.phx.gbl... > I've read through this thread but I don't understand it. I always use an > int or smallint as primary key, with identity. I believe it would be a mess > otherwise.
Your belief does not alter the correct criteria for choosing a key: simplicity, familiarity and stability. [quoted text, click to view] > Example: I have a table with people, last name, first name, address, ... So > suppose you would make a natural key then you need at least the last name > and the first name. I have >25 other tables that reference that table. If > I get this right I will need to use the name and firstname field in all the > other tables as well to reference. Isn't that just a lot of data waste? If > I'm missing something, please tell me what because this seems a bit silly.
You have constructed a straw man. One anecdote does not demonstrate or justify a general principle or rule.
I see why you were elected a most vociferous person. That's a very long way of stating something as simple as the criteria for choosing a primary key: simplicity, familiarity and stability. [quoted text, click to view] "Aaron Bertrand [MVP]" <aaron@TRASHaspfaq.com> wrote in message news:entEu2UuDHA.1888@TK2MSFTNGP10.phx.gbl... > You have definitely hit a couple of the strong points in IDENTITY's favor. > But there are cases where a "natural" kind of key can be small as well. > Consider when eBay bought PayPal (and other than that, the rest of this > paragraph is completely fictional), they likely had to merge some data... > perhaps eBay used an IDENTITY to generate customer numbers, but they want to > align those primary keys with the new data in the PayPal tables. So, the > keys in the PayPal data become INTs, but not IDENTITY. They are kind of > "natural" because they came to the PayPal from an external source, so to > speak, rather than generated arbitrarily from within. > > Of course, completely fictional. But surely you can see that not all > natural keys are going to be larger than an IDENTITY, or less efficient. > There are other examples, too. In a small stats system, a SMALLDATETIME > could be the primary key (perhaps several subrelated tables are organized by > day). In fact, part of http://www.aspfaq.com/stats.asp (and plenty more > that you can't see) is derived on a set of tables where SMALLDATETIME is the > only key of relevance. Okay, so that's still 4 bytes, but you save 4 if > your other alternative is to store an IDENTITY along with the SMALLDATETIME > value. Consider: > > CREATE TABLE calendar > ( > dateValue SMALLDATETIME PRIMARY KEY > ) > > vs. > > CREATE TABLE calendar > ( > dateID INT IDENTITY PRIMARY KEY, > dateValue SMALLDATETIME NOT NULL > ) > > Never mind my goofy naming scheme. :-) > > Now, Kass could probably show me some cool dateadd tricks that would allow > me to store just an INT (or maybe even a SMALLINT, depending on the date > range required), and determine what the date value is at runtime. Not that > I think that's what his argument would be, but rather just to show that it > is still possible to choose either route. I think the usability of the date > value representing what it is, rather than having to derive its value from > some formula, is a good thing. > > In cases like e-mail address and SSN (and in fact most cases), I still > prefer your route, where there is a surrogate key (IDENTITY) that prevents > me from having to cascade changes all over the place, and store larger > foreign keys. > > Firstname + lastname is obviously a bad choice for a key of any kind, > because I know more than one Aaron Bertrand. So then you bring middle name > into the key, and it can still be repeated. Other things like getting > adopted, re-married, legally changing their name, and other reasons why this > "key" would change are minor; changes to the key can be dealt with in the > database using DRI/CASCADE or, worst case scenario, through rigorous update > code; it will be tougher to re-train users to look up all the tables > containing "Carmen Bertrand" instead of "Carmen Electra." :-) However, I > think the possibility of two people having the same key is a far more > compelling argument for bypassing the natural key and placing some > meaningless identifier, like IDENTITY, that the user doesn't care about and > would never have to change. > > Now, you might think, "why not bring SSN into the FirstName + MiddleName + > LastName key? That would make it unique." Yes, and hideously large. If > SSN is unique, then why not just use SSN as the key? Again, it's large even > on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I > fail to see the benefit of repeating the value in every related table, DRI > or not. > > Sorry about the earful, sometimes I get a little typographical diarrhea. > Hopefully that was at least marginally intelligible. > > -- > Aaron Bertrand > SQL Server MVP > http://www.aspfaq.com/ > > > > > "Stijn Verrept" <sverrept@nospan.vub.ac.be> wrote in message > news:#G5HW$TuDHA.1512@TK2MSFTNGP10.phx.gbl... > > I've read through this thread but I don't understand it. I always use an > > int or smallint as primary key, with identity. I believe it would be a > mess > > otherwise. > > > > Example: I have a table with people, last name, first name, address, ... > So > > suppose you would make a natural key then you need at least the last name > > and the first name. I have >25 other tables that reference that table. > If > > I get this right I will need to use the name and firstname field in all > the > > other tables as well to reference. Isn't that just a lot of data waste? > If > > I'm missing something, please tell me what because this seems a bit silly. > > > > > > Stijn Verrept. > > > > > >
"Trey Walpole" <treyNOpole@SPcomcastAM.net> writes: [quoted text, click to view] > I am a firm believer that natural keys should only be used to logically > design/normalize the data. When it comes to the real reason for keys, data > integrity, more often than not I have seen that natural keys are > intrinsically not good physical primary keys.
This differs from my experience. [quoted text, click to view] > 1. Natural keys are, being natural and therefore user entered [i.e., > provided to the database by external means], fungible. If a user enters > data, they must also be able to modify it. If data can be modified, then its > value as a systemic primary key is gone. Yes, you can cascade updates to > these, but why do it when it can be avoided to start with.
Why do it? To avoid duplicates of course. Why not do it? You don't seem to be making any sort of case here. [quoted text, click to view] > 2. Natural keys are typically a composite of atomic attributes. If using a > composite, these must be propagated to referencing tables as foreign keys. > Your normalization drops below par, by having these [potentially] massively > duplicated columns.
I agree that natural keys should be avoided because they're too large. However, most business reports I see typically have columns that consist of abbreviations chosen to make the report less wide. These make great natural keys. [quoted text, click to view] > Attributes that are single, [supposedly] unique attributes (e.g., SSN), > usually represent some official, governmentally recognized ID, and therefore > have legal issues with being propagated throughout a system.
Yes, SSNs are problematic because too many organizations use them for authentication, i.e. "You put Trey Walpole's SSN on this form, so you must be Trey Walpole". Even in the absence of such stupid organizations, privacy advocates oppose national IDs for a very good reason: such IDs make it easy to create good databases that include people. However, I assume the original poster had the opposite goal: make it easy to create good databases. [quoted text, click to view] > Also, for amateurs and many professionals, natural keys are very often > chosen incorrectly. e.g., I believe some combination of Name and other info > has been used by my ISP as their primary key. My last name was entered into > their system incorrectly, but they cannot fix it because their system will > not allow it. Preposterous and poor design.
It sounds like this "preposterous and poor design" could have been fixed with a simple REFERENCES ... ON UPDATE CASCADE. Perhaps you should offer them your services. I can't tell from your story whether or not they used a poor choice of natural key. Yes, people can make bad choices as to natural primary keys, but I think this one additional opportunity to do bad database design is well worth the risk, given the problems that arise from redundant or duplicate data. [quoted text, click to view] > Surrogate keys generated by using the identity property are ideal for data > integrity, because > 1. They are static values [i.e., once entered, it does not change] and the > DBA has control over allowing values in identity columns to be modified.
This is only meaningful if there's something wrong with ON UPDATE CASCADE, which I think there isn't. [quoted text, click to view] > 2. They are singleton row ids. And thus the problem. Earlier this year an e-mail alert system I wrote
was sending two copies when it should just send one. Looking into it, the employee table had been doubled. I switch to a select distinct to work around the problem, and someone deleted the duplicates. If we had a natural primary key for the employee table, I doubt those duplicates would have gone in.
[quoted text, click to view] "Bob Badour" <bbadour@golden.net> wrote in message news:Z-WdnVKmtPjLcFCi4p2dnA@golden.net... > "Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in message > news:ex84CxQuDHA.1788@tk2msftngp13.phx.gbl... > > The problem is with how it is used. If you start giving users access to > > identity based values, you get into a bad spot where they want to make > > changes to the value (in my line of work, we don't like the numbers 666 in > > account numbers) so using identities for user values is a bad idea. I use > > them only for internal pointers that are never presented to users, since > > they are not modifiable. I could use guids, or characters, or whatever > for > > keys and no one would be the wiser. > > Keys are logical identifiers. They identify data for the user as well as for > the dbms. Preventing the user from seeing the identifier is just stupid. >
Why? Do you want the user typing, remembering, or dealing with the difference between ID=320983902 and 320984902 or 320983903? I certainly don't. Invariably they would want something that they understood. Or what about a GUID: 6969B66E-6A7A-4E89-B2D9-B35799B335C1 vs DCF5DBC5-73B5-4009-9BBC-9312CFD6AD9D. Yick. Kind of like you have a user name, an email address, fingerprints, an SSN and DNA, etc that all identify you, but you only use a few of them here. DNA is ugly, but as perfect of an identifier, but I don't know what my DNA is, but it still exists. [quoted text, click to view] > > > > Natural keys are nothing more than familiar surrogates. > > > > You are kind of right here, but it is generally true that natural keys can > > change, because in the world, things can change. Identities cannot > change. > > It is not a "kind of" rightness. The statement is obviously and > self-evidently right to anyone with at least a minimal education in the > fundamentals of data management. >
I suppose. But I don't think that the statement means anything. I don't agree that all natural keys are familiar surrogates. Not every natural key was initally randomly chosen. Some level of thought was placed to choosing a name, hence it is not just a familar surrogate. A surrogate key (the word surrogate meaning to take the place of, and the definition of a surrogate key is: A unique primary key generated by the RDBMS that is not derived from any data in the database and whose only significance is to act as the primary key. I would remove the word primary from the sentence and we have what I would agree with the definition) They are very much alike, but I don't think you can state that a natural key is a surrogate, ever. They are both keys, in that they both are determinants, which is why we have called them keys. -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :)
[quoted text, click to view] Stijn Verrept wrote: >"David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message >news:NvCdnazr-9_dZlCiRVn-vA@giganews.com... > > >>Do you really allow the same Doctor, Department, etc to appear twice in >> >> >its > > >>table with different keys? If you don't declare unique natural keys then >>that's the kind of problem you have. An IDENTITY isn't a *surrogate* key >> >> >at > > >>all unless the table also has a natural key - it's just a physical row >>identifier. >> >> > >I never said I allow them to appear twice in the column, you have Unique >Constraint for that. I could use that as a natural key, but I prefer using >an int or smallint. I don't want to note Name, Firstname, ... in another >table as foreign key! Also in the application I don't see me writing: >select SN_Active from seniors where (SN_Name = :SNName) and (SN_FirstName = >:FirstName) and (SN_BirthDate = :SNBirthDate). > > >Stijn Verrept. > > >
How about putting a UNIQUE NOT NULL CLUSTERED constraint on the identity column and putting the PRIMARY KEY NONCLUSTERED constraint on the multi-column primary key? That might confuse the anti-identity fanatics enough so they'll stop complaining. You will have a natural primary key, so they won't think the world is coming to an end, but you will go on as you always have, using the identity column for its convenience in queries, FK constraints, etc. ;) SK
[quoted text, click to view] "Stijn Verrept" <sverrept@nospan.vub.ac.be> wrote in message news:OFELqOYuDHA.3496@TK2MSFTNGP11.phx.gbl... > > In cases like e-mail address and SSN (and in fact most cases), I still > > prefer your route, where there is a surrogate key (IDENTITY) that prevents > > me from having to cascade changes all over the place, and store larger > > foreign keys. > > Ok I agree that sometimes natural keys are good. But mind the word:
NATURAL KEYS ARE ALWAYS GOOD. I was standing on my desk shouting if you didn't see it :) Just not always preferrable as PRIMARY KEYS! Always put keys on EVERY unique combination (that doesn't include other unique combiniations ( if ID is a unique key, then ID, Name should not be, use a simple index in this case) for the sake of your data. Look up Boyce Codd normal form for more information. -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :) [quoted text, click to view] > sometimes :). After reading through this thread I got the impression that > natural keys are used more than surrogate keys, while in practice I use > identity with surrogate keys almost all the time. > > > Now, you might think, "why not bring SSN into the FirstName + MiddleName + > > LastName key? That would make it unique." Yes, and hideously large. If > > SSN is unique, then why not just use SSN as the key? Again, it's large > even > > on its own (CHAR(9) and surely to become CHAR(10) in our lifetimes), so I > > fail to see the benefit of repeating the value in every related table, DRI > > or not. > > Indeed, and I don't know about American legislation but maybe in the future > (or even now) you can have people work for you who don't have a SSN (who > work from a distance country for example) and then you'll get stuck again. > > > Sorry about the earful, sometimes I get a little typographical diarrhea. > > Hopefully that was at least marginally intelligible. > > It was :) > > > Stijn Verrept. > >
BINGO! This is the point that seems to be missed by most every one. Thank you David -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgrou |