Groups | Blog | Home
all groups > sql server programming > april 2004 >

sql server programming : Table Design Question


Joe Celko
4/25/2004 10:12:41 PM
[quoted text, click to view]

First post some specs, sample data and DDL. Then get a real, verifiable
key. By definition, the auto-increment cannot be a key. In fact, the
non-DDL that you posted looks wrong -- why is a product_id and a product
different?

[quoted text, click to view]
have data in each of these tables (which means some times Table1-5ID are
null). <<

Unh? Again, by definition, an identifier (key) cannot be a NULL. You
sound like you are trying to build some kind of network database in SQL
and want to use NULLs as nil pointers in a linked list.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Timothy
4/25/2004 11:05:09 PM
Hi,

I have a question about database design. I have a main table that has 5
child tables referring to it through a one to one relationship. The design
of my table is the following:

Products
ProductID (autoincremental)
ProductCode
....
Table1ID (foreign key)
Table2ID (foreign key)
Table3ID (foreign key)
Table4ID (foreign key)
Table5ID (foreign key)

Each table's ID is autoincremental and each product does not have to have
data in each of these tables (which means some times Table1-5ID are null).
This is design is difficult to insert and update data via a client
application. For instance, when inserting data I have to insert Table1-5
data before the Products table and it makes updating very complex (updating
via the client is already complex because update data is somtimes pulled
from another source and the client does not if record exists....) I thought
about redesigning the products and table1-5 to this:

Products
ProductID
ProductCode...

Tables1-5
ProductID (foreign key)

Use the productID to be the key in tables1-5 which would make it easier to
work with. Is there a better way to do this? Can I get away with using the
productID as the key for tables1-5? Any suggestions would be appreciated.

Thanks



Aaron Bertrand [MVP]
4/26/2004 1:06:02 AM
I agree with Alan, the foreign key reference should point from the child
table back to the "primary" table (e.g. the one containing the actual entity
you are modeling, in this case a "product")... this also makes it much
easier to add table6, table7, table8, etc.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/


[quoted text, click to view]

Timothy
4/26/2004 8:55:15 AM
Thanks for the replies. I have one more question regarding the design. The
reason I went with the design having the keys for table1-5 in the products
table made it easy for me to check and see if there was data for that
product in the tables1-5. What is the best way to mimic this? I would like
on the client side be able to view the products table and see if there other
data in the other tables I need to retrieve.

Thanks

[quoted text, click to view]

Timothy
4/26/2004 4:20:33 PM
Alan,

Thanks for the reply. The product I am representing in the my table design
is highly comfigurable with many options. These 5 tables are the different
options that the product can have and the tables are organized into 5
logical tables. I could have one products table however there would be alot
of nulls for each product. So, I decided to normalize the table and it came
out to be 5 tables and one main products table. The products table
represent the main product, what I mean by this is the product will always
have data for the fields in the products table. Each table has a one to many
relationship with the products table. I think you might be right that I
could use a summary table that would keep track of the options tables. What
do you think?

Here is an example of the DDL for some of the tables:

CREATE TABLE [dbo].[Products] (
[ProductID] [int] IDENTITY (1, 1) NOT NULL ,
[MCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[HCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Generic Options] (
[GenericOptionID] [int] IDENTITY (100, 1) NOT NULL ,
[ProductID] [int] NOT NULL ,
[VCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[FCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[ModelOptions] (
[ModelOptionID] [int] IDENTITY (1, 1) NOT NULL ,
[ProductID] [int] NOT NULL ,
[SR] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[SD] [int] NULL ,
[Jac] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LSize] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[PIn] [int] NULL ,
[ILength] [float] NULL ,
[ILengthUnit] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[SHole] [int] NULL
) ON [PRIMARY]
GO

Thanks

[quoted text, click to view]

Alan Howard
4/26/2004 4:58:44 PM

[quoted text, click to view]

And what happens when you have to insert more than one child record in any
one of your child tables?? You'll be trying to insert duplicate values for
your primary key.

Consider:

Products
-ProductID (Integer, Identity, Primary Key)
-ProductCode (...)

Tables1-5
-Table1ID (Integer, Identity, Primary Key)
-ProductID (Foreign Key)

The PK of each child table is an Identity column (or a suitable key selected
from the candidate columns), and each child table contains a column
containing the foreign key reference to the appropriate record in the
Products table, i.e. the ProductID.

Alan

Joe Celko
4/26/2004 9:32:09 PM
[quoted text, click to view]

The regulars are cringing now :)

A table is a set of things or relationships of the same kind. A key is
a subset of attributes in the entities being modeled which uniquely
identify each element of the set of entities. It is part of

An auto-numbering is based on a PHYSICAL location or state inside a
machine. It has absolutely nothing to do with the data model or the
reality of the data.

A key should be verifiable within itself. That means that when I see a
particular kind of identifier, I ought to know if it is syntactically
correct. For example, I know that ISBN 0-486-60028-9 has the correct
number of digits and that the check digit is correct for a proper
International Standard Book Number. Later on I can find out that it
identifies the Dover Books edition of AN INVESTIGATION OF THE LAWS OF
THOUGHT by George Boole.

An identifier should have repeatable verification against the reality
that you are trying to capture in your data model. If I put the same
data into another databse, do I get the same auto-increment number?
Nope!

Exactly what verification means can be a bit fuzzy. At one extreme,
prison inmates are moved by taking their fingerprints at control points
and courts want DNA evidence for convictions. At the other end of the
spectrum, retail stores will accept your check on the assumption that
you look like your driver's license photograph.

What you are doing is faking a network database using IDENTITY for
pointers, instead of creating an RDBMS.

Let me go ahead one more step and play Q&A with the direction I think
you are going:

Q: Couldn't a compound key become very long?

A1: So what? This is the 2000's century and we have much better
computers than we did in the 1950's when key size was a real physical
issue. What is funny to me is the number of idiots who replace a
natural two or three integer compound key with a huge GUID that no human
being or other system can possibly understand because they think it will
be faster and easy to program.

A2: This is an implementation problem that the SQL engine can handle.
For example, Teradata is an SQL designed for VLDB apps that uses hashing
instead of B-tree or other indexes. They guarantee that no search
requires more than two probes, no matter how large the database. A tree
index requires more and more probes as the size of the database
increases.

A3: A long key is not always a bad thing fro performance. For example,
if I use (city, state) as my key, I get a free index on just (city). I
can also add extra columns to the key to make it a super-key when such a
super-key gives me a covering index (i.e. an index which contains all of
the columns required for a query, so that the base table does not have
to be accessed at all).

[quoted text, click to view]
acceptable (in the relational model) to have an Identity attribute to
use as a handle to the row, and for attributes in other tables to use as
the target for a foreign key? <<

A handle to the row? Oh, you mean faking a sequential file's positional
record number, so I can reference the physical storage location? Sure,
if I want to lose all the advantages of an abstract data model, SQL set
oriented programming, carry extra data and destroy the portability of
code!

More and more programmers who have absolutely no database training are
being told to design a database. They are using GUIDs, IDENTITY, ROWID
and other proprietary auto-numbering "features" in SQL products to
imitate either a record number (sequential file system mindset) or OID
(OO mindset) since they don't know anything else.

Experienced database designers tend toward intelligent keys they find in
industry standard codes, such as UPC, VIN, GTIN, ISBN, etc. They know
that they need to verify the data against the reality they are modeling.
A trusted external source is a good thing to have.

The IDENTITY column is a holdover from the early programming languages
which were very close to the hardware. For example, the fields (not
columns; big difference) in a COBOL or FORTRAN program were assumed to
be physically located in main storage in the order they were declared in
the program. The languages have constructs using that model -- logical
and physical implementations are practically one! The data has meaning
BECAUSE of the program reading it (i.e. the same bits could be a
character in one program and be an integer in another).

The early SQLs were based on existing file systems. The data was kept
in physically contiguous disk pages, in physically contiguous rows, made
up of physically contiguous columns. In short, just like a deck of
punch cards or a magnetic tape. Most programmer still carry that mental
model, which is why I keep doing that rant about file vs. table, row vs.
record and column vs. field.

But physically contiguous storage is only one way of building a
relational database and it is not the best one. The basic idea of a
relational database is that user is not supposed to know *how* or
*where* things are stored at all, much less write code that depends on
the particular physical representation in a particular release of a
particular product on particular hardware at a particular time.

One of the biggest errors is the IDENTITY column (actually property, not
a column at all) in the Sybase/SQL Server family. People actually
program with this "feature" and even use it as the primary key for the
table! Now, let's go into painful details as to why this thing is bad.

The first practical consideration is that IDENTITY is proprietary and
non-portable, so you know that you will have maintenance problems when
you change releases or port your system to other products. Newbies
actually think they will never port code! Perhaps they only work for
companies that are failing and will be gone. Perhaps their code is such
crap nobody else want their application.

But let's look at the logical problems. First try to create a table
with two columns and try to make them both IDENTITY. If you cannot
declare more than one column to be of a certain data type, then that
thing is not a datatype at all, by definition. It is a property which
belongs to the PHYSICAL table, not the LOGICAL data in the table.

Next, create a table with one column and make it an IDENTITY. Now try
to insert, update and delete different numbers from it. If you cannot
insert, update and delete rows from a table, then it is not a table by
definition.

Finally create a simple table with one IDENTITY and a few other columns.
Use a few statements like

INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');

Alan Howard
4/27/2004 8:12:10 AM
To test the tables individually you could join from Products to each of your
child tables on the ProductID, or use a subselect to retrieve, say, the
count of products with that ProductID in a particular child table. Perhaps
another question is why you need five different child tables (we don't
really have enough information) - assuming you do, maybe you need another
summary table that keeps track of the Products and where the related records
reside??



[quoted text, click to view]

Alan Howard
4/27/2004 8:14:55 AM

[quoted text, click to view]

Hi Joe,

I'm interested in what you're implying here - care to elaborate?

Alan

Zach Wells
4/27/2004 8:43:27 AM
[quoted text, click to view]

I was waiting for that response. :) I understand everything you're
saying, and in *theory* it all sounds great. However, all of us "idiots"
work in the real world and are forced to make design decisions that
might not fit the nice, neat, complete scenarios you come up with.

For example, what would you make the primary key be for a table of
people in a database is international? SSN won't work since not only
does not every US citizen have one, but no one outside the US has one
not to mention that you can't always get people's SSN for security
reasons. Name obviously won't work. Height? Weight? Oh, I know, why not
require your company to install fingerprint machines in every location
and require ever person that you plan on putting into the database come
into the office and submit their fingerprint? After all, I'm sure when
you simply explain to your boss that this is necessary to have a
"properly" modeled database he'll be more than happy to foot the bill!
The truth is, in the real world, we don't always have keys that are
inherent in the data and having some sort of system for a generic key is
essential. I'm not saying that you should, by default, throw an identity
column in every table with a field called "ID", but I am saying that
sometimes you have to have some sort of key that is not built from the
data. With your vast experience in the industry, I honestly can't
imagine that you've not encountered this situation multiple times. I
know I have.

The simple fact is that in the real world we quite often do not have
complete data to work with. It is not uncommon to have situations where,
for example, you may only get a name and address for someone until they
are a full employee or something like that. Once that has happened THEN
you can get their SSN. In this case, should we have the person’s first
name, last name, middle name, address1, address2, city, state and zip
make up the primary key? Is that *really* preferable to having a generic
numeric key? I know you're quick to claim that when people use the
identity column that they’re trying to go back to the tape days in the
50's and duplicate that. Well, I can tell you that I'm not. I wasn't
alive in the 50's. When I do use an identity column as a primary key, it
is because I can't be guaranteed to have a reliable, complete primary
key within the data. I could care less *what* the identity column comes
up with for the key. I don't care if it is sequential. I don't care if
there are gaps; all I care about is that it is unique.

Timothy
4/27/2004 9:21:08 AM
Alan,

I appreciate your response. You might have misunderstood me or I
misunderstood your response. The child tables represent product options and
not product typed. Since this is a highly comfigurable product (millions of
different computations) I do not use the products table like the Northwind
Example where the products are already in the products table. I rather save
the product from the client (user input) into the database. Each product is
potentially different and with that in mind these child tables are used or
not used to hold data (it all depends on the user input). Why do you
suggest having both a foreign key the products table and productID in the
child tables?

Thanks
[quoted text, click to view]
Zach Wells
4/27/2004 11:11:46 AM
[quoted text, click to view]

Again, you're assuming complete data. My experience is that having
complete data at the creation point of the data is something that is
nice to have but isn't always available.

[quoted text, click to view]

So you're assuming that every person from every country has some unique
number that identifies them? Moreover, you're suggesting that you mix
data types in a single column? How is someone supposed to know that a
particular number is a US SSN or a canadian counterpart? Or any other
particular country? I guess you could expect your employees to all learn
and understand every countries format (again, only for those that have
them).

[quoted text, click to view]

Right, there is, but in the real world it can quite often been different
data for different rows and requires human intervention to distinguish
it from other "similar" looking data. This is where having a generic key
comes in handy.

Another real world example, I've developed databases for recruiting
companies. The goal of a recruiting company is to have as many
candidates in the database as possible. In addition, the database is
also supposed to house imported data from legacy systems. In this
database you may have a dozen Michael Smiths. How do you distinguish
them? You can't get their SSN until they're an employee. You may or may
not have address for all of them. You may or may not have phone numbers
for all of them. What you do is have a generic key that is used to make
the rows unique. Then, the user, when searching for michael smith can
use what data is there to visually determine which particular "Michael
Smith" they want. Usually this means checking the phone number, address,
work history, resume, etc. Any number of fields may need to be looked at
to finally determine which exact Michael Smith you want.

Aaron Bertrand [MVP]
4/27/2004 11:19:51 AM
[quoted text, click to view]

Having a unique index on this set of attributes for data integrity is one
thing. Using it as a primary key is a completely different thing... would
you like to design the child tables that refer to this monstrous key through
a foreign key constraint? Would you like to write the queries that join the
tables together? And you seem to sidestep the point that you don't always
have all the data at the outset?

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Alan Howard
4/27/2004 11:24:17 AM
It sounds like a similar structure to one that I'm using in a current
project where people can be classified as different types - there is one
base table for the common columns and two or three 'child' tables that
contain columns for each specific type. To model your problem one approach
would be to insert the primary key from the Products table (ProductID) into
each of your child tables as a foreign key (which it looks like you're
doing), and then add a foreign key column to your Products table that
references the primary key column of a new table, ProductTypes. By checking
the ProductType of the Product record you can then join to the appropriate
child table (of which I'm assuming only one at a time will be related to a
Products record).

I my situation I use a simple data access layer that basically maps each
table (Products, GenericOptions, ModelOptions, etc.) to a class, and then a
business layer over that which provides the logical abstraction into the
various types (GenericProduct, ModelProduct, etc.). If I ever found myself
in a position of needing to instantiate a (in your case) Product of unknown
type, I would do so only long enough to determine the ProductType, and then
break down the generic object and instantiate an object of the proper type.

Hope this isn't getting too far off track for you.

Cheers,

Alan


[quoted text, click to view]
Zach Wells
4/27/2004 12:26:23 PM
[quoted text, click to view]

Thanks for making that additional point for me. I was about to add that
to my response but got called away for lunch and you know, priorities
and everything. :D

Aaron Bertrand [MVP]
4/27/2004 12:56:55 PM
My POV is documented here, FWIW.
http://www.aspfaq.com/2504

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/





[quoted text, click to view]

Zach Wells
4/27/2004 1:13:28 PM
[quoted text, click to view]

Good read and not out of line with my thinking. ;)

Zach Wells
4/27/2004 3:46:39 PM
[quoted text, click to view]

So you're suggesting that you have a Persons, Employees and Contacts
table that you shuffle people through depending on their status? How is
that proper design? A person is a person is a person. They may have a
different status at different points in a process, but does that really
warrant a totally separate table? What does it mean when a data entry
error puts the same person in two tables? How does it get resolved?
Furthermore, how does any of this solve the "you must under all
circumstances have a natural primary key" mantra that Joe forces on us?

[quoted text, click to view]

See above, I find this "solution" more of a headache! ;)

[quoted text, click to view]

Doesn't matter. Moreover, why does it have to represent anything other
than a particular row of data? #242 represents a specific row of data
somewhere in your employee table. Doesn't matter where that row is (i.e.
I don't care if it is sequential or not).

[quoted text, click to view]

Understood. I wasn't saying that they had NONE of those, I'm just saying
they could have some different combination of those. Some could have
email, some could have phone, etc. It would be up to the end user to
figure out, based on what data is available, which particular row they
really wanted.

I've been trying to fix disasters with imported candidates from
[quoted text, click to view]

I still have yet to have anyone, including you, give me a valid, useful,
meaningful, reliable primary key that you could use on a table of
employees. You can try and write off the previous DBA's design as being
"poorly researched", but the fact is, it very would could have been
his/her only option. Moreover, the "disaster" very well could simply be
a "poorly researched" import process. The fact that primary key was not
a natural key is likely *not* the reason the import was a disaster.

I still lay down the challenge to anyone reading this. Give me a
reliable key to use on an employee/person table that covers companies
that have international employees. A few years ago I had this very
discussion with Joe and I don't recall that he ever gave me one either.
I do believe that he tried to say that using dna, retinal scans or
fingerprints were a reliable "unique" identifier for people and could be
used as a primary key but I think we all can see how impractical that is.

Zach Wells
4/27/2004 4:24:04 PM
[quoted text, click to view]

I understand what you're saying now, but I still don't see how this
solves the problem that we're discussing. What would your natural
primary key be in the persons table given all the parameters previously
mentioned in the preceding threads?

Rickard Axne
4/27/2004 4:52:57 PM


[quoted text, click to view]

If these attributes are unique in what you are modelling, shouldn't they
be modelled as unique anyway?

One other possiblity in this scenario is to use sub types. These
subtypes have diffrent roles (attributes) and probably diffrent primary
means of identification. US-Employees could be identified by SSN;
Swedish-Employees by our equivalent, personal number. These could both
benefit from a super-type of Contacts, identified by, say email address.

The point is, in the real world threre is a way to identify one instance
of whatever it is your modelling, otherwise you're obviously in trouble
anyway. I don't see what benefit an artificial key, specific for your
Zach Wells
4/27/2004 5:41:23 PM
[quoted text, click to view]

I agree it isn't a contest, I just feel like Joe is quick to throw pure
theory at newbies who may look at his name and think "Well, Joe says it
so I'll trash everything I'v already done and do it his way". I just
wanted to provide another perspective.

I personally respect Joe a lot and have learned tons of stuff from his
books that I've bought. I credit Joe with helping me make the switch
from thinking procedurally to set-based in regards to sql. I just think
he's a bit *too* theoretical in an environment where most people are
knee-deep in the real world.

[quoted text, click to view]

Ugh! I know what you're talking about. I've seen some "professional"
database apps out there that had to be some of the worst designed messes
I've ever encountered.

Rickard Axne
4/27/2004 8:47:52 PM
[quoted text, click to view]

I didn't mean to suggest I'm against adding surrogate key(s) on
occasions where the candidate keys just aren't pratical as PK.

With outset, I assume you mean scenarios where you have, at some point,
collected some information about the object but not all? If this is an
effect of your business process then I believe the sub types and super
types are quite good as a solution.
Rickard Axne
4/27/2004 9:19:33 PM
[quoted text, click to view]

Not at all. But I am assuming that something that you model can go
through diffrent stages or events. Here is a simple example without DDL:
If you are an employer, you will want to have people you need to stay in
touch with in a [Persons] table. If you later on hire this guy you put
him in the [Employees] table with a FK to [Contacts]. Obviously
Employees have a lot more attributes, like what office the use.

[quoted text, click to view]

Actually I've lived in countries without these magic numbers. Se above
and substitute [Employees] with [Foreign_employees] etc.

Obviously diffrent formats of identifying attributes are modelled
diffrently, probably often in diffrent tables.

[quoted text, click to view]

But what does this artificial key represent in the reality????? Call
some external company and ask about #242 :)

[quoted text, click to view]

I've worked for recruiting companies as well, and if you don't have any
means of identifying candidates (email, phone etc) they're not of any
value... I've been trying to fix disasters with imported candidates from
bought up companies etc where it turns out we suddenly have multiple
accounts for the same people. After a while everything breaks apart,
just because the DBA couldn't do the research and instead implemented
Rickard Axne
4/27/2004 10:06:34 PM
[quoted text, click to view]

Sorry, [Contacts] was a typo.

Nobody gets shuffled around.

I cannot make this clearer I think. Person x belongs in the persons
table, in the event this the same person gets hired he will belong in
both Persons and Employees. Employees has a foreign key to Persons. This
is a just a stupid example of sub-typing with separate tables and re-use
of attributes which similar objects share. Surely this is easier to
maintain and understand than wide tables with tonnes of nullable attributes?


Rickard
Alan Howard
4/28/2004 8:33:34 AM
Yep - thanks for providing your perspective Joe - an interesting read.

Alan

[quoted text, click to view]
Zach Wells
4/28/2004 8:42:19 AM
[quoted text, click to view]

Personally, I don't like the idea of having a identity key as a standard
simply to be consistent across tables. I *try* to use the data for a
primary key but when I find that the table requires more than about 3
fields to indicate a unique row, I start thinking about using a
surrogate key. However, the truth is, in my real world experience, I
find that I have to use a surrogate key more often than not. I'll admit
though, sometimes I do it because it is simply much easier given the
data, sometimes I honestly have no other alternative and recently it has
been become a lot of my work is creating fact tables for OLAP cubes
which basically throws normalization out the window anyway. ;)

Zach
Alan Howard
4/28/2004 9:20:04 AM
[quoted text, click to view]

While it's not supposed to be a contest, I'm going to side with Zach and
Aaron on this. Perhaps I'm just a pragmatist at heart but while Joe's
arguments are compelling in theory, and just a wee bit reminiscent of
University texts, time and time again I find it easier/more appropriate/more
consistent to use IDENTITY cols for artificial keys in my database. As Aaron
writes in his article, I think it's more important to weigh up the pros and
cons of each approach/philosophy and then to make an informed decision. If
there was only the one best way of doing things then this systems
development lark would get a bit boring :)

As an aside, I'm currently working for a client that is implementing an
enterprise HRIS. Over 1,000 tables and no relationships, indexes or
constraints defined at all (the database includes a dictionary that the
application uses to derive the table structure). No primary keys defined at
the database level and because we've had a few migration issues you often
find yourself in a position of not being able to address an individual row
for update or delete. Artificial and natural values are used for the 'keys'
in various tables, and unique values like EMP_NO are defined by the
application and appear in many, many tables right throughout the structure.
This particular 'key' is artificial and forms part of the compound key of
another 255 tables making maintenance a major hassle. It's interesting to
work on a project like this because, for me, it provides a widely different
perspective.

Alan


Alan Howard
4/28/2004 9:35:53 AM
To talk implementation for a second, has anyone come up with an in-house
best-practice policy governing the definition of key columns? If I decided
that every entity-derived table in our model should have an artificial key
column (IDENTITY), should purely associative tables (those implementing m:n
relationships) just use a compound key defined over the two contributing
foreign keys, or should this table also have it's own artificial key, with a
unique index defined over the two foreign key columns?

I think that there's a certain consistency in creating ALL tables with an
artificial key, regardless of their use, although I suppose there would be a
performance penalty (and a bit of a beating in here). If, in the future, the
domain was extended and you wanted to add attribute columns to the
associative table, it would still match the design of other entity tables in
your database, with minimal changes to your schema (and other bits).

Do you guys consider these sorts of things?

Alan


[quoted text, click to view]

Anith Sen
4/28/2004 9:36:48 AM
Zach,

[quoted text, click to view]

There are no universal rules regarding how a specific attribute fits as a
primary key at the logical level. It exclusively depends on the business
model (conceptual schema) which is being represented in a database.

The only formal & general considerations for a primary key in relational
databases are stability, simplicity, familiarity & minimality.

--
Anith

Zach Wells
4/28/2004 9:49:27 AM
[quoted text, click to view]

An email address is a horrible candidate as a primary key.

Zach Wells
4/28/2004 10:55:01 AM
[quoted text, click to view]

Keep in mind, we're discussing what a good primary key is for a table
that is used to house people.

a) It can change for people quite often
b) People have can multiple email addresses
c) email addresses can be reissued by isps
d) prone to data entry errors
e) Not everyone has one (in fact, i'd guess that well over half of the
world's population doesn't have one)

There's the first few off the top of my head.

Zach Wells
4/28/2004 10:59:50 AM
[quoted text, click to view]

You're missing the point I'm trying to make. The whole thread started
over the age-old discussion of using natural primary keys vs using
generic keys. Joe (and I assume others) seem to take the stance that
there is never a reason in a "properly" designed database to need a
generic key. I say there is and as an example I use a VERY common type
of data, a Person. I'm asking the question: What is a good, valid,
natural key for people? Email is not one. It may be in some very
specific cases but it definitely is not a good key in general. Neither
is SSN. I'm claiming that there really isn't a good natural key that you
can use as a primary key when dealing with people and that using a
generic key is not only acceptable, but required.

Aaron Bertrand [MVP]
4/28/2004 11:04:15 AM
[quoted text, click to view]

f) many e-mail addresses are shared by multiple people, e.g. a household

(Every natural key is affected by d), by the way.)

Aaron Bertrand [MVP]
4/28/2004 11:05:37 AM
[quoted text, click to view]

If you're building a table with 500 users, sure you can ignore those things.
If you're dealing with millions of rows, data warehousing, etc., you're
making a very big mistake not considering them.

--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/

Anith Sen
4/28/2004 11:54:04 AM
[quoted text, click to view]
Person. <<

That is where you faulted. Simply proclaiming an entity name without the
pertinent business model will not gather much meat for you. An entity type
namely "Person" in different business contexts impart different meanings,
associate different rules and have different external predicates. Thus,
declaring the invalidity of specific attribute like email address as a key
without contextual details, as this example, is questionable.

[quoted text, click to view]
<<

A better one would be: What is a good key for an entity type? The answer: It
depends on the business model where the entity type exists.

[quoted text, click to view]
as a primary key when dealing with people and that using a generic key is
not only acceptable, but required. <<

Even with the commonly understood meaning of "natural" keys, none of your
arguments go beyond the key selection criteria in my last post.

The point is, any attribute is "natural" or "artificial" only within the
contexts where it is being discoursed & so, being obsessed about attribute
being "natural" or otherwise is mostly meaningless at the logical level.
Based on the business model & data requireents, a competent designer can
admit another attribute as a primary key or dismiss an existing one based on
the criteria of stability, simplicity, familiarity & minimality.

--
Anith

Liz
4/28/2004 12:41:04 PM

[quoted text, click to view]

just use their Ashcroft # ...



Zach Wells
4/28/2004 1:14:16 PM
[quoted text, click to view]

That all sounds nice and pretty but it doesn't change what I've said. I
don't care what the business model is, in the vast majority of the cases
out there, a table that is used to house primary people data (name, dob,
etc), there is nothing inherent to the data that is a good candidate for
a primary key. I'm still waiting for someone to refute me on that. All
I'm hearing is more *theory*.

Trey Walpole
4/28/2004 1:26:34 PM
hmm, I find myself agreeing with everyone on this one ...

email qua email is horrible pk period. [for the reasons Zach posted]
email when used as user id is a fine pk - however, it is no longer simply an
email address.

and if the model (data or business) uses email as a primary identifier.


[quoted text, click to view]
later?

Trey Walpole
4/28/2004 2:38:39 PM
stupid ctrl+enter and ffs...

hmm, I find myself agreeing a little with everyone on this one ...

certain concepts are very difficult to model generically - like a person -
because the attributes about that concept can't always be fit into a good pk
without factoring in the context.

e.g., for a person, generically,
-- email qua email is horrible pk period. [for the reasons Zach and Aaron
posted it wouldn't be a normalized attribute - i haven't seen a model yet
that can show that email qua email is dependent on a person's other
attributes or vice versa]
-- email qua user id [like in some website guestlists] is a fine pk -
however, it is no longer simply an email address.
and if the model (data or business) uses email as a primary identifier, is
that correct? [e.g., the website guestlist - probably a good pk]

however, i have found that attempting to model concepts generically can help
identify potentially incorrect or inefficient business processes/models. if
a business process/model is deemed off, then it's best to fix it prior to
modeling it in data. if there's doubt, then at least a generic/flexible
model can be put into place until further determination is made.
i have also found that, unfortunately, modeling based solely on context for
some common concepts, like a person, can lead to root canals when that
business model changes [although root canals are billable <s>].


[quoted text, click to view]

Zach Wells
4/28/2004 2:59:19 PM
[quoted text, click to view]

Oh you are? I'm sorry, I must have missed the post where you've given
the first good reason to have an email address be a primary key (you
keep changing it to "candidate key", which isn't what this discussion is
about btw). Could you repost where you refuted that an email address is
a bad primary key? I'd like to see why you think it is a *good* primary
key as well.

More importantly, all this is really a tangent to the original point. I
honestly think I've made my point regarding the natural key vs generic
key topic so unless something worth while is posted I'm going to j