[quoted text, click to view] >> Is there a better way to do this? <<
First post some specs, sample data and DDL. Then get a real, verifiable key. By definition, the auto-increment cannot be a key. In fact, the non-DDL that you posted looks wrong -- why is a product_id and a product different? [quoted text, click to view] >> Each table's ID is autoincremental and each product does not have to
have data in each of these tables (which means some times Table1-5ID are null). << Unh? Again, by definition, an identifier (key) cannot be a NULL. You sound like you are trying to build some kind of network database in SQL and want to use NULLs as nil pointers in a linked list. --CELKO-- =========================== Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, datatypes, etc. in your schema are. *** Sent via Developersdex http://www.developersdex.com ***
Hi, I have a question about database design. I have a main table that has 5 child tables referring to it through a one to one relationship. The design of my table is the following: Products ProductID (autoincremental) ProductCode .... Table1ID (foreign key) Table2ID (foreign key) Table3ID (foreign key) Table4ID (foreign key) Table5ID (foreign key) Each table's ID is autoincremental and each product does not have to have data in each of these tables (which means some times Table1-5ID are null). This is design is difficult to insert and update data via a client application. For instance, when inserting data I have to insert Table1-5 data before the Products table and it makes updating very complex (updating via the client is already complex because update data is somtimes pulled from another source and the client does not if record exists....) I thought about redesigning the products and table1-5 to this: Products ProductID ProductCode... Tables1-5 ProductID (foreign key) Use the productID to be the key in tables1-5 which would make it easier to work with. Is there a better way to do this? Can I get away with using the productID as the key for tables1-5? Any suggestions would be appreciated. Thanks
I agree with Alan, the foreign key reference should point from the child table back to the "primary" table (e.g. the one containing the actual entity you are modeling, in this case a "product")... this also makes it much easier to add table6, table7, table8, etc. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/ [quoted text, click to view] "Timothy" <timmy@noemail.com> wrote in message news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > Hi, > > I have a question about database design. I have a main table that has 5 > child tables referring to it through a one to one relationship. The > design > of my table is the following: > > Products > ProductID (autoincremental) > ProductCode > ... > Table1ID (foreign key) > Table2ID (foreign key) > Table3ID (foreign key) > Table4ID (foreign key) > Table5ID (foreign key) > > Each table's ID is autoincremental and each product does not have to have > data in each of these tables (which means some times Table1-5ID are null). > This is design is difficult to insert and update data via a client > application. For instance, when inserting data I have to insert Table1-5 > data before the Products table and it makes updating very complex > (updating > via the client is already complex because update data is somtimes pulled > from another source and the client does not if record exists....) I > thought > about redesigning the products and table1-5 to this: > > Products > ProductID > ProductCode... > > Tables1-5 > ProductID (foreign key) > > Use the productID to be the key in tables1-5 which would make it easier to > work with. Is there a better way to do this? Can I get away with using > the > productID as the key for tables1-5? Any suggestions would be appreciated. > > Thanks > > > >
Thanks for the replies. I have one more question regarding the design. The reason I went with the design having the keys for table1-5 in the products table made it easy for me to check and see if there was data for that product in the tables1-5. What is the best way to mimic this? I would like on the client side be able to view the products table and see if there other data in the other tables I need to retrieve. Thanks [quoted text, click to view] "Aaron Bertrand [MVP]" <aaron@TRASHaspfaq.com> wrote in message news:uAIQtw0KEHA.892@TK2MSFTNGP09.phx.gbl... > I agree with Alan, the foreign key reference should point from the child > table back to the "primary" table (e.g. the one containing the actual entity > you are modeling, in this case a "product")... this also makes it much > easier to add table6, table7, table8, etc. > > -- > Aaron Bertrand > SQL Server MVP > http://www.aspfaq.com/ > > > "Timothy" <timmy@noemail.com> wrote in message > news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > > Hi, > > > > I have a question about database design. I have a main table that has 5 > > child tables referring to it through a one to one relationship. The > > design > > of my table is the following: > > > > Products > > ProductID (autoincremental) > > ProductCode > > ... > > Table1ID (foreign key) > > Table2ID (foreign key) > > Table3ID (foreign key) > > Table4ID (foreign key) > > Table5ID (foreign key) > > > > Each table's ID is autoincremental and each product does not have to have > > data in each of these tables (which means some times Table1-5ID are null). > > This is design is difficult to insert and update data via a client > > application. For instance, when inserting data I have to insert Table1-5 > > data before the Products table and it makes updating very complex > > (updating > > via the client is already complex because update data is somtimes pulled > > from another source and the client does not if record exists....) I > > thought > > about redesigning the products and table1-5 to this: > > > > Products > > ProductID > > ProductCode... > > > > Tables1-5 > > ProductID (foreign key) > > > > Use the productID to be the key in tables1-5 which would make it easier to > > work with. Is there a better way to do this? Can I get away with using > > the > > productID as the key for tables1-5? Any suggestions would be appreciated. > > > > Thanks > > > > > > > > > >
Alan, Thanks for the reply. The product I am representing in the my table design is highly comfigurable with many options. These 5 tables are the different options that the product can have and the tables are organized into 5 logical tables. I could have one products table however there would be alot of nulls for each product. So, I decided to normalize the table and it came out to be 5 tables and one main products table. The products table represent the main product, what I mean by this is the product will always have data for the fields in the products table. Each table has a one to many relationship with the products table. I think you might be right that I could use a summary table that would keep track of the options tables. What do you think? Here is an example of the DDL for some of the tables: CREATE TABLE [dbo].[Products] ( [ProductID] [int] IDENTITY (1, 1) NOT NULL , [MCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [PCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [HCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[Generic Options] ( [GenericOptionID] [int] IDENTITY (100, 1) NOT NULL , [ProductID] [int] NOT NULL , [VCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [FCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , ) ON [PRIMARY] GO CREATE TABLE [dbo].[ModelOptions] ( [ModelOptionID] [int] IDENTITY (1, 1) NOT NULL , [ProductID] [int] NOT NULL , [SR] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SD] [int] NULL , [Jac] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [LSize] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [PIn] [int] NULL , [ILength] [float] NULL , [ILengthUnit] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SHole] [int] NULL ) ON [PRIMARY] GO Thanks [quoted text, click to view] "Alan Howard" <Xalan.howardX@Xparadise.net.nzX> wrote in message news:eZZoQs8KEHA.1348@TK2MSFTNGP12.phx.gbl... > To test the tables individually you could join from Products to each of your > child tables on the ProductID, or use a subselect to retrieve, say, the > count of products with that ProductID in a particular child table. Perhaps > another question is why you need five different child tables (we don't > really have enough information) - assuming you do, maybe you need another > summary table that keeps track of the Products and where the related records > reside?? > > > > "Timothy" <timmy@noemail.com> wrote in message > news:ujauE85KEHA.1000@TK2MSFTNGP11.phx.gbl... > > Thanks for the replies. I have one more question regarding the design. > The > > reason I went with the design having the keys for table1-5 in the products > > table made it easy for me to check and see if there was data for that > > product in the tables1-5. What is the best way to mimic this? I would > like > > on the client side be able to view the products table and see if there > other > > data in the other tables I need to retrieve. > > > > Thanks > > > > "Aaron Bertrand [MVP]" <aaron@TRASHaspfaq.com> wrote in message > > news:uAIQtw0KEHA.892@TK2MSFTNGP09.phx.gbl... > > > I agree with Alan, the foreign key reference should point from the child > > > table back to the "primary" table (e.g. the one containing the actual > > entity > > > you are modeling, in this case a "product")... this also makes it much > > > easier to add table6, table7, table8, etc. > > > > > > -- > > > Aaron Bertrand > > > SQL Server MVP > > > http://www.aspfaq.com/ > > > > > > > > > "Timothy" <timmy@noemail.com> wrote in message > > > news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > > > > Hi, > > > > > > > > I have a question about database design. I have a main table that has > 5 > > > > child tables referring to it through a one to one relationship. The > > > > design > > > > of my table is the following: > > > > > > > > Products > > > > ProductID (autoincremental) > > > > ProductCode > > > > ... > > > > Table1ID (foreign key) > > > > Table2ID (foreign key) > > > > Table3ID (foreign key) > > > > Table4ID (foreign key) > > > > Table5ID (foreign key) > > > > > > > > Each table's ID is autoincremental and each product does not have to > > have > > > > data in each of these tables (which means some times Table1-5ID are > > null). > > > > This is design is difficult to insert and update data via a client > > > > application. For instance, when inserting data I have to insert > > Table1-5 > > > > data before the Products table and it makes updating very complex > > > > (updating > > > > via the client is already complex because update data is somtimes > pulled > > > > from another source and the client does not if record exists....) I > > > > thought > > > > about redesigning the products and table1-5 to this: > > > > > > > > Products > > > > ProductID > > > > ProductCode... > > > > > > > > Tables1-5 > > > > ProductID (foreign key) > > > > > > > > Use the productID to be the key in tables1-5 which would make it > easier > > to > > > > work with. Is there a better way to do this? Can I get away with > using > > > > the > > > > productID as the key for tables1-5? Any suggestions would be > > appreciated. > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > >
[quoted text, click to view] "Timothy" <timmy@noemail.com> wrote in message news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > from another source and the client does not if record exists....) I thought > about redesigning the products and table1-5 to this: > > Products > ProductID > ProductCode... > > Tables1-5 > ProductID (foreign key) > > Use the productID to be the key in tables1-5 which would make it easier to > work with. Is there a better way to do this? Can I get away with using the > productID as the key for tables1-5? Any suggestions would be appreciated.
And what happens when you have to insert more than one child record in any one of your child tables?? You'll be trying to insert duplicate values for your primary key. Consider: Products -ProductID (Integer, Identity, Primary Key) -ProductCode (...) Tables1-5 -Table1ID (Integer, Identity, Primary Key) -ProductID (Foreign Key) The PK of each child table is an Identity column (or a suitable key selected from the candidate columns), and each child table contains a column containing the foreign key reference to the appropriate record in the Products table, i.e. the ProductID. Alan
[quoted text, click to view] >> I'm interested in what you're implying here - care to elaborate? <<
The regulars are cringing now :) A table is a set of things or relationships of the same kind. A key is a subset of attributes in the entities being modeled which uniquely identify each element of the set of entities. It is part of An auto-numbering is based on a PHYSICAL location or state inside a machine. It has absolutely nothing to do with the data model or the reality of the data. A key should be verifiable within itself. That means that when I see a particular kind of identifier, I ought to know if it is syntactically correct. For example, I know that ISBN 0-486-60028-9 has the correct number of digits and that the check digit is correct for a proper International Standard Book Number. Later on I can find out that it identifies the Dover Books edition of AN INVESTIGATION OF THE LAWS OF THOUGHT by George Boole. An identifier should have repeatable verification against the reality that you are trying to capture in your data model. If I put the same data into another databse, do I get the same auto-increment number? Nope! Exactly what verification means can be a bit fuzzy. At one extreme, prison inmates are moved by taking their fingerprints at control points and courts want DNA evidence for convictions. At the other end of the spectrum, retail stores will accept your check on the assumption that you look like your driver's license photograph. What you are doing is faking a network database using IDENTITY for pointers, instead of creating an RDBMS. Let me go ahead one more step and play Q&A with the direction I think you are going: Q: Couldn't a compound key become very long? A1: So what? This is the 2000's century and we have much better computers than we did in the 1950's when key size was a real physical issue. What is funny to me is the number of idiots who replace a natural two or three integer compound key with a huge GUID that no human being or other system can possibly understand because they think it will be faster and easy to program. A2: This is an implementation problem that the SQL engine can handle. For example, Teradata is an SQL designed for VLDB apps that uses hashing instead of B-tree or other indexes. They guarantee that no search requires more than two probes, no matter how large the database. A tree index requires more and more probes as the size of the database increases. A3: A long key is not always a bad thing fro performance. For example, if I use (city, state) as my key, I get a free index on just (city). I can also add extra columns to the key to make it a super-key when such a super-key gives me a covering index (i.e. an index which contains all of the columns required for a query, so that the base table does not have to be accessed at all). [quoted text, click to view] >> Do you then advocate never using an Identity attribute? Or is it
acceptable (in the relational model) to have an Identity attribute to use as a handle to the row, and for attributes in other tables to use as the target for a foreign key? << A handle to the row? Oh, you mean faking a sequential file's positional record number, so I can reference the physical storage location? Sure, if I want to lose all the advantages of an abstract data model, SQL set oriented programming, carry extra data and destroy the portability of code! More and more programmers who have absolutely no database training are being told to design a database. They are using GUIDs, IDENTITY, ROWID and other proprietary auto-numbering "features" in SQL products to imitate either a record number (sequential file system mindset) or OID (OO mindset) since they don't know anything else. Experienced database designers tend toward intelligent keys they find in industry standard codes, such as UPC, VIN, GTIN, ISBN, etc. They know that they need to verify the data against the reality they are modeling. A trusted external source is a good thing to have. The IDENTITY column is a holdover from the early programming languages which were very close to the hardware. For example, the fields (not columns; big difference) in a COBOL or FORTRAN program were assumed to be physically located in main storage in the order they were declared in the program. The languages have constructs using that model -- logical and physical implementations are practically one! The data has meaning BECAUSE of the program reading it (i.e. the same bits could be a character in one program and be an integer in another). The early SQLs were based on existing file systems. The data was kept in physically contiguous disk pages, in physically contiguous rows, made up of physically contiguous columns. In short, just like a deck of punch cards or a magnetic tape. Most programmer still carry that mental model, which is why I keep doing that rant about file vs. table, row vs. record and column vs. field. But physically contiguous storage is only one way of building a relational database and it is not the best one. The basic idea of a relational database is that user is not supposed to know *how* or *where* things are stored at all, much less write code that depends on the particular physical representation in a particular release of a particular product on particular hardware at a particular time. One of the biggest errors is the IDENTITY column (actually property, not a column at all) in the Sybase/SQL Server family. People actually program with this "feature" and even use it as the primary key for the table! Now, let's go into painful details as to why this thing is bad. The first practical consideration is that IDENTITY is proprietary and non-portable, so you know that you will have maintenance problems when you change releases or port your system to other products. Newbies actually think they will never port code! Perhaps they only work for companies that are failing and will be gone. Perhaps their code is such crap nobody else want their application. But let's look at the logical problems. First try to create a table with two columns and try to make them both IDENTITY. If you cannot declare more than one column to be of a certain data type, then that thing is not a datatype at all, by definition. It is a property which belongs to the PHYSICAL table, not the LOGICAL data in the table. Next, create a table with one column and make it an IDENTITY. Now try to insert, update and delete different numbers from it. If you cannot insert, update and delete rows from a table, then it is not a table by definition. Finally create a simple table with one IDENTITY and a few other columns. Use a few statements like INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1'); INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2'); INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
To test the tables individually you could join from Products to each of your child tables on the ProductID, or use a subselect to retrieve, say, the count of products with that ProductID in a particular child table. Perhaps another question is why you need five different child tables (we don't really have enough information) - assuming you do, maybe you need another summary table that keeps track of the Products and where the related records reside?? [quoted text, click to view] "Timothy" <timmy@noemail.com> wrote in message news:ujauE85KEHA.1000@TK2MSFTNGP11.phx.gbl... > Thanks for the replies. I have one more question regarding the design. The > reason I went with the design having the keys for table1-5 in the products > table made it easy for me to check and see if there was data for that > product in the tables1-5. What is the best way to mimic this? I would like > on the client side be able to view the products table and see if there other > data in the other tables I need to retrieve. > > Thanks > > "Aaron Bertrand [MVP]" <aaron@TRASHaspfaq.com> wrote in message > news:uAIQtw0KEHA.892@TK2MSFTNGP09.phx.gbl... > > I agree with Alan, the foreign key reference should point from the child > > table back to the "primary" table (e.g. the one containing the actual > entity > > you are modeling, in this case a "product")... this also makes it much > > easier to add table6, table7, table8, etc. > > > > -- > > Aaron Bertrand > > SQL Server MVP > > http://www.aspfaq.com/ > > > > > > "Timothy" <timmy@noemail.com> wrote in message > > news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > > > Hi, > > > > > > I have a question about database design. I have a main table that has 5 > > > child tables referring to it through a one to one relationship. The > > > design > > > of my table is the following: > > > > > > Products > > > ProductID (autoincremental) > > > ProductCode > > > ... > > > Table1ID (foreign key) > > > Table2ID (foreign key) > > > Table3ID (foreign key) > > > Table4ID (foreign key) > > > Table5ID (foreign key) > > > > > > Each table's ID is autoincremental and each product does not have to > have > > > data in each of these tables (which means some times Table1-5ID are > null). > > > This is design is difficult to insert and update data via a client > > > application. For instance, when inserting data I have to insert > Table1-5 > > > data before the Products table and it makes updating very complex > > > (updating > > > via the client is already complex because update data is somtimes pulled > > > from another source and the client does not if record exists....) I > > > thought > > > about redesigning the products and table1-5 to this: > > > > > > Products > > > ProductID > > > ProductCode... > > > > > > Tables1-5 > > > ProductID (foreign key) > > > > > > Use the productID to be the key in tables1-5 which would make it easier > to > > > work with. Is there a better way to do this? Can I get away with using > > > the > > > productID as the key for tables1-5? Any suggestions would be > appreciated. > > > > > > Thanks > > > > > > > > > > > > > > > > > >
[quoted text, click to view] "Joe Celko" <jcelko212@earthlink.net> wrote in message news:uMRta00KEHA.3292@TK2MSFTNGP11.phx.gbl... > By definition, the auto-increment cannot be a key.
Hi Joe, I'm interested in what you're implying here - care to elaborate? Alan
[quoted text, click to view] Joe Celko wrote: >>>I'm interested in what you're implying here - care to elaborate? << > > > The regulars are cringing now :) >
I was waiting for that response. :) I understand everything you're saying, and in *theory* it all sounds great. However, all of us "idiots" work in the real world and are forced to make design decisions that might not fit the nice, neat, complete scenarios you come up with. For example, what would you make the primary key be for a table of people in a database is international? SSN won't work since not only does not every US citizen have one, but no one outside the US has one not to mention that you can't always get people's SSN for security reasons. Name obviously won't work. Height? Weight? Oh, I know, why not require your company to install fingerprint machines in every location and require ever person that you plan on putting into the database come into the office and submit their fingerprint? After all, I'm sure when you simply explain to your boss that this is necessary to have a "properly" modeled database he'll be more than happy to foot the bill! The truth is, in the real world, we don't always have keys that are inherent in the data and having some sort of system for a generic key is essential. I'm not saying that you should, by default, throw an identity column in every table with a field called "ID", but I am saying that sometimes you have to have some sort of key that is not built from the data. With your vast experience in the industry, I honestly can't imagine that you've not encountered this situation multiple times. I know I have. The simple fact is that in the real world we quite often do not have complete data to work with. It is not uncommon to have situations where, for example, you may only get a name and address for someone until they are a full employee or something like that. Once that has happened THEN you can get their SSN. In this case, should we have the person’s first name, last name, middle name, address1, address2, city, state and zip make up the primary key? Is that *really* preferable to having a generic numeric key? I know you're quick to claim that when people use the identity column that they’re trying to go back to the tape days in the 50's and duplicate that. Well, I can tell you that I'm not. I wasn't alive in the 50's. When I do use an identity column as a primary key, it is because I can't be guaranteed to have a reliable, complete primary key within the data. I could care less *what* the identity column comes up with for the key. I don't care if it is sequential. I don't care if there are gaps; all I care about is that it is unique.
Alan, I appreciate your response. You might have misunderstood me or I misunderstood your response. The child tables represent product options and not product typed. Since this is a highly comfigurable product (millions of different computations) I do not use the products table like the Northwind Example where the products are already in the products table. I rather save the product from the client (user input) into the database. Each product is potentially different and with that in mind these child tables are used or not used to hold data (it all depends on the user input). Why do you suggest having both a foreign key the products table and productID in the child tables? Thanks [quoted text, click to view] "Alan Howard" <Xalan.howardX@Xparadise.net.nzX> wrote in message news:OfsDnX%23KEHA.268@TK2MSFTNGP11.phx.gbl... > It sounds like a similar structure to one that I'm using in a current > project where people can be classified as different types - there is one > base table for the common columns and two or three 'child' tables that > contain columns for each specific type. To model your problem one approach > would be to insert the primary key from the Products table (ProductID) into > each of your child tables as a foreign key (which it looks like you're > doing), and then add a foreign key column to your Products table that > references the primary key column of a new table, ProductTypes. By checking > the ProductType of the Product record you can then join to the appropriate > child table (of which I'm assuming only one at a time will be related to a > Products record). > > I my situation I use a simple data access layer that basically maps each > table (Products, GenericOptions, ModelOptions, etc.) to a class, and then a > business layer over that which provides the logical abstraction into the > various types (GenericProduct, ModelProduct, etc.). If I ever found myself > in a position of needing to instantiate a (in your case) Product of unknown > type, I would do so only long enough to determine the ProductType, and then > break down the generic object and instantiate an object of the proper type. > > Hope this isn't getting too far off track for you. > > Cheers, > > Alan > > > "Timothy" <timmy@noemail.com> wrote in message > news:eZSc609KEHA.556@TK2MSFTNGP10.phx.gbl... > > Alan, > > > > Thanks for the reply. The product I am representing in the my table > design > > is highly comfigurable with many options. These 5 tables are the > different > > options that the product can have and the tables are organized into 5 > > logical tables. I could have one products table however there would be > alot > > of nulls for each product. So, I decided to normalize the table and it > came > > out to be 5 tables and one main products table. The products table > > represent the main product, what I mean by this is the product will always > > have data for the fields in the products table. Each table has a one to > many > > relationship with the products table. I think you might be right that I > > could use a summary table that would keep track of the options tables. > What > > do you think? > > > > Here is an example of the DDL for some of the tables: > > > > CREATE TABLE [dbo].[Products] ( > > [ProductID] [int] IDENTITY (1, 1) NOT NULL , > > [MCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [PCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [HCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL > > ) ON [PRIMARY] > > GO > > > > CREATE TABLE [dbo].[Generic Options] ( > > [GenericOptionID] [int] IDENTITY (100, 1) NOT NULL , > > [ProductID] [int] NOT NULL , > > [VCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [FCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > ) ON [PRIMARY] > > GO > > > > CREATE TABLE [dbo].[ModelOptions] ( > > [ModelOptionID] [int] IDENTITY (1, 1) NOT NULL , > > [ProductID] [int] NOT NULL , > > [SR] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [SD] [int] NULL , > > [Jac] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [LSize] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [PIn] [int] NULL , > > [ILength] [float] NULL , > > [ILengthUnit] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > > [SHole] [int] NULL > > ) ON [PRIMARY] > > GO > > > > Thanks > > > > "Alan Howard" <Xalan.howardX@Xparadise.net.nzX> wrote in message > > news:eZZoQs8KEHA.1348@TK2MSFTNGP12.phx.gbl... > > > To test the tables individually you could join from Products to each of > > your > > > child tables on the ProductID, or use a subselect to retrieve, say, the > > > count of products with that ProductID in a particular child table. > Perhaps > > > another question is why you need five different child tables (we don't > > > really have enough information) - assuming you do, maybe you need > another > > > summary table that keeps track of the Products and where the related > > records > > > reside?? > > > > > > > > > > > > "Timothy" <timmy@noemail.com> wrote in message > > > news:ujauE85KEHA.1000@TK2MSFTNGP11.phx.gbl... > > > > Thanks for the replies. I have one more question regarding the > design. > > > The > > > > reason I went with the design having the keys for table1-5 in the > > products > > > > table made it easy for me to check and see if there was data for that > > > > product in the tables1-5. What is the best way to mimic this? I would > > > like > > > > on the client side be able to view the products table and see if there > > > other > > > > data in the other tables I need to retrieve. > > > > > > > > Thanks > > > > > > > > "Aaron Bertrand [MVP]" <aaron@TRASHaspfaq.com> wrote in message > > > > news:uAIQtw0KEHA.892@TK2MSFTNGP09.phx.gbl... > > > > > I agree with Alan, the foreign key reference should point from the > > child > > > > > table back to the "primary" table (e.g. the one containing the > actual > > > > entity > > > > > you are modeling, in this case a "product")... this also makes it > much > > > > > easier to add table6, table7, table8, etc. > > > > > > > > > > -- > > > > > Aaron Bertrand > > > > > SQL Server MVP > > > > > http://www.aspfaq.com/ > > > > > > > > > > > > > > > "Timothy" <timmy@noemail.com> wrote in message > > > > > news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > > > > > > Hi, > > > > > > > > > > > > I have a question about database design. I have a main table that > > has > > > 5 > > > > > > child tables referring to it through a one to one relationship. > The > > > > > > design > > > > > > of my table is the following: > > > > > > > > > > > > Products > > > > > > ProductID (autoincremental) > > > > > > ProductCode > > > > > > ... > > > > > > Table1ID (foreign key) > > > > > > Table2ID (foreign key)
[quoted text, click to view] Rickard Axne wrote: > > >> The simple fact is that in the real world we quite often do not have >> complete data to work with. It is not uncommon to have situations >> where, for example, you may only get a name and address for someone >> until they are a full employee or something like that. Once that has >> happened THEN you can get their SSN. In this case, should we have the >> person’s first name, last name, middle name, address1, address2, city, >> state and zip make up the primary key? > > > If these attributes are unique in what you are modelling, shouldn't they > be modelled as unique anyway?
Again, you're assuming complete data. My experience is that having complete data at the creation point of the data is something that is nice to have but isn't always available. [quoted text, click to view] > > One other possiblity in this scenario is to use sub types. These > subtypes have diffrent roles (attributes) and probably diffrent primary > means of identification. US-Employees could be identified by SSN; > Swedish-Employees by our equivalent, personal number. These could both > benefit from a super-type of Contacts, identified by, say email address.
So you're assuming that every person from every country has some unique number that identifies them? Moreover, you're suggesting that you mix data types in a single column? How is someone supposed to know that a particular number is a US SSN or a canadian counterpart? Or any other particular country? I guess you could expect your employees to all learn and understand every countries format (again, only for those that have them). [quoted text, click to view] > > The point is, in the real world threre is a way to identify one instance > of whatever it is your modelling, otherwise you're obviously in trouble > anyway. I don't see what benefit an artificial key, specific for your > domain, could bring in the model.
Right, there is, but in the real world it can quite often been different data for different rows and requires human intervention to distinguish it from other "similar" looking data. This is where having a generic key comes in handy. Another real world example, I've developed databases for recruiting companies. The goal of a recruiting company is to have as many candidates in the database as possible. In addition, the database is also supposed to house imported data from legacy systems. In this database you may have a dozen Michael Smiths. How do you distinguish them? You can't get their SSN until they're an employee. You may or may not have address for all of them. You may or may not have phone numbers for all of them. What you do is have a generic key that is used to make the rows unique. Then, the user, when searching for michael smith can use what data is there to visually determine which particular "Michael Smith" they want. Usually this means checking the phone number, address, work history, resume, etc. Any number of fields may need to be looked at to finally determine which exact Michael Smith you want.
[quoted text, click to view] > If these attributes are unique in what you are modelling, shouldn't they > be modelled as unique anyway?
Having a unique index on this set of attributes for data integrity is one thing. Using it as a primary key is a completely different thing... would you like to design the child tables that refer to this monstrous key through a foreign key constraint? Would you like to write the queries that join the tables together? And you seem to sidestep the point that you don't always have all the data at the outset? -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
It sounds like a similar structure to one that I'm using in a current project where people can be classified as different types - there is one base table for the common columns and two or three 'child' tables that contain columns for each specific type. To model your problem one approach would be to insert the primary key from the Products table (ProductID) into each of your child tables as a foreign key (which it looks like you're doing), and then add a foreign key column to your Products table that references the primary key column of a new table, ProductTypes. By checking the ProductType of the Product record you can then join to the appropriate child table (of which I'm assuming only one at a time will be related to a Products record). I my situation I use a simple data access layer that basically maps each table (Products, GenericOptions, ModelOptions, etc.) to a class, and then a business layer over that which provides the logical abstraction into the various types (GenericProduct, ModelProduct, etc.). If I ever found myself in a position of needing to instantiate a (in your case) Product of unknown type, I would do so only long enough to determine the ProductType, and then break down the generic object and instantiate an object of the proper type. Hope this isn't getting too far off track for you. Cheers, Alan [quoted text, click to view] "Timothy" <timmy@noemail.com> wrote in message news:eZSc609KEHA.556@TK2MSFTNGP10.phx.gbl... > Alan, > > Thanks for the reply. The product I am representing in the my table design > is highly comfigurable with many options. These 5 tables are the different > options that the product can have and the tables are organized into 5 > logical tables. I could have one products table however there would be alot > of nulls for each product. So, I decided to normalize the table and it came > out to be 5 tables and one main products table. The products table > represent the main product, what I mean by this is the product will always > have data for the fields in the products table. Each table has a one to many > relationship with the products table. I think you might be right that I > could use a summary table that would keep track of the options tables. What > do you think? > > Here is an example of the DDL for some of the tables: > > CREATE TABLE [dbo].[Products] ( > [ProductID] [int] IDENTITY (1, 1) NOT NULL , > [MCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [PCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [HCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL > ) ON [PRIMARY] > GO > > CREATE TABLE [dbo].[Generic Options] ( > [GenericOptionID] [int] IDENTITY (100, 1) NOT NULL , > [ProductID] [int] NOT NULL , > [VCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [FCode] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > ) ON [PRIMARY] > GO > > CREATE TABLE [dbo].[ModelOptions] ( > [ModelOptionID] [int] IDENTITY (1, 1) NOT NULL , > [ProductID] [int] NOT NULL , > [SR] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [SD] [int] NULL , > [Jac] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [LSize] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [PIn] [int] NULL , > [ILength] [float] NULL , > [ILengthUnit] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , > [SHole] [int] NULL > ) ON [PRIMARY] > GO > > Thanks > > "Alan Howard" <Xalan.howardX@Xparadise.net.nzX> wrote in message > news:eZZoQs8KEHA.1348@TK2MSFTNGP12.phx.gbl... > > To test the tables individually you could join from Products to each of > your > > child tables on the ProductID, or use a subselect to retrieve, say, the > > count of products with that ProductID in a particular child table. Perhaps > > another question is why you need five different child tables (we don't > > really have enough information) - assuming you do, maybe you need another > > summary table that keeps track of the Products and where the related > records > > reside?? > > > > > > > > "Timothy" <timmy@noemail.com> wrote in message > > news:ujauE85KEHA.1000@TK2MSFTNGP11.phx.gbl... > > > Thanks for the replies. I have one more question regarding the design. > > The > > > reason I went with the design having the keys for table1-5 in the > products > > > table made it easy for me to check and see if there was data for that > > > product in the tables1-5. What is the best way to mimic this? I would > > like > > > on the client side be able to view the products table and see if there > > other > > > data in the other tables I need to retrieve. > > > > > > Thanks > > > > > > "Aaron Bertrand [MVP]" <aaron@TRASHaspfaq.com> wrote in message > > > news:uAIQtw0KEHA.892@TK2MSFTNGP09.phx.gbl... > > > > I agree with Alan, the foreign key reference should point from the > child > > > > table back to the "primary" table (e.g. the one containing the actual > > > entity > > > > you are modeling, in this case a "product")... this also makes it much > > > > easier to add table6, table7, table8, etc. > > > > > > > > -- > > > > Aaron Bertrand > > > > SQL Server MVP > > > > http://www.aspfaq.com/ > > > > > > > > > > > > "Timothy" <timmy@noemail.com> wrote in message > > > > news:OAaOgX0KEHA.3728@TK2MSFTNGP12.phx.gbl... > > > > > Hi, > > > > > > > > > > I have a question about database design. I have a main table that > has > > 5 > > > > > child tables referring to it through a one to one relationship. The > > > > > design > > > > > of my table is the following: > > > > > > > > > > Products > > > > > ProductID (autoincremental) > > > > > ProductCode > > > > > ... > > > > > Table1ID (foreign key) > > > > > Table2ID (foreign key) > > > > > Table3ID (foreign key) > > > > > Table4ID (foreign key) > > > > > Table5ID (foreign key) > > > > > > > > > > Each table's ID is autoincremental and each product does not have to > > > have > > > > > data in each of these tables (which means some times Table1-5ID are > > > null). > > > > > This is design is difficult to insert and update data via a client > > > > > application. For instance, when inserting data I have to insert > > > Table1-5 > > > > > data before the Products table and it makes updating very complex > > > > > (updating > > > > > via the client is already complex because update data is somtimes > > pulled > > > > > from another source and the client does not if record exists....) I > > > > > thought > > > > > about redesigning the products and table1-5 to this: > > > > > > > > > > Products > > > > > ProductID > > > > > ProductCode... > > > > > > > > > > Tables1-5 > > > > > ProductID (foreign key) > > > > > > > > > > Use the productID to be the key in tables1-5 which would make it > > easier > > > to > > > > > work with. Is there a better way to do this? Can I get away with
[quoted text, click to view] Aaron Bertrand [MVP] wrote: >>If these attributes are unique in what you are modelling, shouldn't they >>be modelled as unique anyway? > > > Having a unique index on this set of attributes for data integrity is one > thing. Using it as a primary key is a completely different thing... would > you like to design the child tables that refer to this monstrous key through > a foreign key constraint? Would you like to write the queries that join the > tables together? And you seem to sidestep the point that you don't always > have all the data at the outset? >
Thanks for making that additional point for me. I was about to add that to my response but got called away for lunch and you know, priorities and everything. :D
My POV is documented here, FWIW. http://www.aspfaq.com/2504 -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/ [quoted text, click to view] > Thanks for making that additional point for me. I was about to add that to > my response but got called away for lunch and you know, priorities and > everything. :D
[quoted text, click to view] Rickard Axne wrote: > Zach Wells wrote: > >> Rickard Axne wrote: >> >>> If these attributes are unique in what you are modelling, shouldn't >>> they be modelled as unique anyway? >> >> >> Again, you're assuming complete data. My experience is that having >> complete data at the creation point of the data is something that is >> nice to have but isn't always available. > > > Not at all. But I am assuming that something that you model can go > through diffrent stages or events. Here is a simple example without DDL: > If you are an employer, you will want to have people you need to stay in > touch with in a [Persons] table. If you later on hire this guy you put > him in the [Employees] table with a FK to [Contacts]. Obviously > Employees have a lot more attributes, like what office the use.
So you're suggesting that you have a Persons, Employees and Contacts table that you shuffle people through depending on their status? How is that proper design? A person is a person is a person. They may have a different status at different points in a process, but does that really warrant a totally separate table? What does it mean when a data entry error puts the same person in two tables? How does it get resolved? Furthermore, how does any of this solve the "you must under all circumstances have a natural primary key" mantra that Joe forces on us? [quoted text, click to view] > >>> One other possiblity in this scenario is to use sub types. These >>> subtypes have diffrent roles (attributes) and probably diffrent >>> primary means of identification. US-Employees could be identified by >>> SSN; Swedish-Employees by our equivalent, personal number. These >>> could both benefit from a super-type of Contacts, identified by, say >>> email address. >> >> >> So you're assuming that every person from every country has some >> unique number that identifies them? Moreover, you're suggesting that >> you mix data types in a single column? How is someone supposed to know >> that a particular number is a US SSN or a canadian counterpart? Or any >> other particular country? I guess you could expect your employees to >> all learn and understand every countries format (again, only for those >> that have them). > > > Actually I've lived in countries without these magic numbers. Se above > and substitute [Employees] with [Foreign_employees] etc. > > Obviously diffrent formats of identifying attributes are modelled > diffrently, probably often in diffrent tables.
See above, I find this "solution" more of a headache! ;) [quoted text, click to view] > >>> The point is, in the real world threre is a way to identify one >>> instance of whatever it is your modelling, otherwise you're obviously >>> in trouble anyway. I don't see what benefit an artificial key, >>> specific for your domain, could bring in the model. >> >> >> >> Right, there is, but in the real world it can quite often been >> different data for different rows and requires human intervention to >> distinguish it from other "similar" looking data. This is where having >> a generic key comes in handy. > > > But what does this artificial key represent in the reality????? Call > some external company and ask about #242 :)
Doesn't matter. Moreover, why does it have to represent anything other than a particular row of data? #242 represents a specific row of data somewhere in your employee table. Doesn't matter where that row is (i.e. I don't care if it is sequential or not). [quoted text, click to view] > >> Another real world example, I've developed databases for recruiting >> companies. The goal of a recruiting company is to have as many >> candidates in the database as possible. In addition, the database is >> also supposed to house imported data from legacy systems. In this >> database you may have a dozen Michael Smiths. How do you distinguish >> them? You can't get their SSN until they're an employee. You may or >> may not have address for all of them. You may or may not have phone >> numbers for all of them. What you do is have a generic key that is >> used to make the rows unique. Then, the user, when searching for >> michael smith can use what data is there to visually determine which >> particular "Michael Smith" they want. Usually this means checking the >> phone number, address, work history, resume, etc. Any number of fields >> may need to be looked at to finally determine which exact Michael >> Smith you want. > > > I've worked for recruiting companies as well, and if you don't have any > means of identifying candidates (email, phone etc) they're not of any > value...
Understood. I wasn't saying that they had NONE of those, I'm just saying they could have some different combination of those. Some could have email, some could have phone, etc. It would be up to the end user to figure out, based on what data is available, which particular row they really wanted. I've been trying to fix disasters with imported candidates from [quoted text, click to view] > bought up companies etc where it turns out we suddenly have multiple > accounts for the same people. After a while everything breaks apart, > just because the DBA couldn't do the research and instead implemented > IDENTITY artificial keys.
I still have yet to have anyone, including you, give me a valid, useful, meaningful, reliable primary key that you could use on a table of employees. You can try and write off the previous DBA's design as being "poorly researched", but the fact is, it very would could have been his/her only option. Moreover, the "disaster" very well could simply be a "poorly researched" import process. The fact that primary key was not a natural key is likely *not* the reason the import was a disaster. I still lay down the challenge to anyone reading this. Give me a reliable key to use on an employee/person table that covers companies that have international employees. A few years ago I had this very discussion with Joe and I don't recall that he ever gave me one either. I do believe that he tried to say that using dna, retinal scans or fingerprints were a reliable "unique" identifier for people and could be used as a primary key but I think we all can see how impractical that is.
[quoted text, click to view] Rickard Axne wrote: > Sorry, [Contacts] was a typo. > > Nobody gets shuffled around. > > I cannot make this clearer I think. Person x belongs in the persons > table, in the event this the same person gets hired he will belong in > both Persons and Employees. Employees has a foreign key to Persons. This > is a just a stupid example of sub-typing with separate tables and re-use > of attributes which similar objects share. Surely this is easier to > maintain and understand than wide tables with tonnes of nullable > attributes? > > > Rickard >
I understand what you're saying now, but I still don't see how this solves the problem that we're discussing. What would your natural primary key be in the persons table given all the parameters previously mentioned in the preceding threads?
[quoted text, click to view] > The simple fact is that in the real world we quite often do not have > complete data to work with. It is not uncommon to have situations where, > for example, you may only get a name and address for someone until they > are a full employee or something like that. Once that has happened THEN > you can get their SSN. In this case, should we have the person’s first > name, last name, middle name, address1, address2, city, state and zip > make up the primary key?
If these attributes are unique in what you are modelling, shouldn't they be modelled as unique anyway? One other possiblity in this scenario is to use sub types. These subtypes have diffrent roles (attributes) and probably diffrent primary means of identification. US-Employees could be identified by SSN; Swedish-Employees by our equivalent, personal number. These could both benefit from a super-type of Contacts, identified by, say email address. The point is, in the real world threre is a way to identify one instance of whatever it is your modelling, otherwise you're obviously in trouble anyway. I don't see what benefit an artificial key, specific for your
[quoted text, click to view] Alan Howard wrote: > "Zach Wells" <no_zwells_spam@ain1.com> wrote in message > news:%232X%23XpHLEHA.3052@TK2MSFTNGP12.phx.gbl... > >>Aaron Bertrand [MVP] wrote: >> >> >>>My POV is documented here, FWIW. >>> http://www.aspfaq.com/2504 >>> >> >>Good read and not out of line with my thinking. ;) > > > While it's not supposed to be a contest, I'm going to side with Zach and > Aaron on this. Perhaps I'm just a pragmatist at heart but while Joe's > arguments are compelling in theory, and just a wee bit reminiscent of > University texts, time and time again I find it easier/more appropriate/more > consistent to use IDENTITY cols for artificial keys in my database. As Aaron > writes in his article, I think it's more important to weigh up the pros and > cons of each approach/philosophy and then to make an informed decision. If > there was only the one best way of doing things then this systems > development lark would get a bit boring :) I agree it isn't a contest, I just feel like Joe is quick to throw pure theory at newbies who may look at his name and think "Well, Joe says it so I'll trash everything I'v already done and do it his way". I just wanted to provide another perspective. I personally respect Joe a lot and have learned tons of stuff from his books that I've bought. I credit Joe with helping me make the switch from thinking procedurally to set-based in regards to sql. I just think he's a bit *too* theoretical in an environment where most people are knee-deep in the real world. [quoted text, click to view] > > As an aside, I'm currently working for a client that is implementing an > enterprise HRIS. Over 1,000 tables and no relationships, indexes or > constraints defined at all (the database includes a dictionary that the > application uses to derive the table structure). No primary keys defined at > the database level and because we've had a few migration issues you often > find yourself in a position of not being able to address an individual row > for update or delete. Artificial and natural values are used for the 'keys' > in various tables, and unique values like EMP_NO are defined by the > application and appear in many, many tables right throughout the structure. > This particular 'key' is artificial and forms part of the compound key of > another 255 tables making maintenance a major hassle. It's interesting to > work on a project like this because, for me, it provides a widely different > perspective. > > Alan
Ugh! I know what you're talking about. I've seen some "professional" database apps out there that had to be some of the worst designed messes I've ever encountered.
[quoted text, click to view] Aaron Bertrand [MVP] wrote: >>If these attributes are unique in what you are modelling, shouldn't they >>be modelled as unique anyway? > > > Having a unique index on this set of attributes for data integrity is one > thing. Using it as a primary key is a completely different thing... would > you like to design the child tables that refer to this monstrous key through > a foreign key constraint? Would you like to write the queries that join the > tables together? And you seem to sidestep the point that you don't always > have all the data at the outset?
I didn't mean to suggest I'm against adding surrogate key(s) on occasions where the candidate keys just aren't pratical as PK. With outset, I assume you mean scenarios where you have, at some point, collected some information about the object but not all? If this is an effect of your business process then I believe the sub types and super types are quite good as a solution.
[quoted text, click to view] Zach Wells wrote: > Rickard Axne wrote: > >> If these attributes are unique in what you are modelling, shouldn't >> they be modelled as unique anyway? > > Again, you're assuming complete data. My experience is that having > complete data at the creation point of the data is something that is > nice to have but isn't always available.
Not at all. But I am assuming that something that you model can go through diffrent stages or events. Here is a simple example without DDL: If you are an employer, you will want to have people you need to stay in touch with in a [Persons] table. If you later on hire this guy you put him in the [Employees] table with a FK to [Contacts]. Obviously Employees have a lot more attributes, like what office the use. [quoted text, click to view] >> One other possiblity in this scenario is to use sub types. These >> subtypes have diffrent roles (attributes) and probably diffrent >> primary means of identification. US-Employees could be identified by >> SSN; Swedish-Employees by our equivalent, personal number. These could >> both benefit from a super-type of Contacts, identified by, say email >> address. > > So you're assuming that every person from every country has some unique > number that identifies them? Moreover, you're suggesting that you mix > data types in a single column? How is someone supposed to know that a > particular number is a US SSN or a canadian counterpart? Or any other > particular country? I guess you could expect your employees to all learn > and understand every countries format (again, only for those that have > them).
Actually I've lived in countries without these magic numbers. Se above and substitute [Employees] with [Foreign_employees] etc. Obviously diffrent formats of identifying attributes are modelled diffrently, probably often in diffrent tables. [quoted text, click to view] >> The point is, in the real world threre is a way to identify one >> instance of whatever it is your modelling, otherwise you're obviously >> in trouble anyway. I don't see what benefit an artificial key, >> specific for your domain, could bring in the model. > > > Right, there is, but in the real world it can quite often been different > data for different rows and requires human intervention to distinguish > it from other "similar" looking data. This is where having a generic key > comes in handy.
But what does this artificial key represent in the reality????? Call some external company and ask about #242 :) [quoted text, click to view] > Another real world example, I've developed databases for recruiting > companies. The goal of a recruiting company is to have as many > candidates in the database as possible. In addition, the database is > also supposed to house imported data from legacy systems. In this > database you may have a dozen Michael Smiths. How do you distinguish > them? You can't get their SSN until they're an employee. You may or may > not have address for all of them. You may or may not have phone numbers > for all of them. What you do is have a generic key that is used to make > the rows unique. Then, the user, when searching for michael smith can > use what data is there to visually determine which particular "Michael > Smith" they want. Usually this means checking the phone number, address, > work history, resume, etc. Any number of fields may need to be looked at > to finally determine which exact Michael Smith you want.
I've worked for recruiting companies as well, and if you don't have any means of identifying candidates (email, phone etc) they're not of any value... I've been trying to fix disasters with imported candidates from bought up companies etc where it turns out we suddenly have multiple accounts for the same people. After a while everything breaks apart, just because the DBA couldn't do the research and instead implemented
[quoted text, click to view] Zach Wells wrote: >> Not at all. But I am assuming that something that you model can go >> through diffrent stages or events. Here is a simple example without >> DDL: If you are an employer, you will want to have people you need to >> stay in touch with in a [Persons] table. If you later on hire this guy >> you put him in the [Employees] table with a FK to [Contacts]. >> Obviously Employees have a lot more attributes, like what office the use. > > > So you're suggesting that you have a Persons, Employees and Contacts > table that you shuffle people through depending on their status? How is > that proper design? A person is a person is a person. They may have a > different status at different points in a process, but does that really > warrant a totally separate table? What does it mean when a data entry > error puts the same person in two tables? How does it get resolved? > Furthermore, how does any of this solve the "you must under all > circumstances have a natural primary key" mantra that Joe forces on us?
Sorry, [Contacts] was a typo. Nobody gets shuffled around. I cannot make this clearer I think. Person x belongs in the persons table, in the event this the same person gets hired he will belong in both Persons and Employees. Employees has a foreign key to Persons. This is a just a stupid example of sub-typing with separate tables and re-use of attributes which similar objects share. Surely this is easier to maintain and understand than wide tables with tonnes of nullable attributes? Rickard
Yep - thanks for providing your perspective Joe - an interesting read. Alan [quoted text, click to view] "Joe Celko" <jcelko212@earthlink.net> wrote in message news:%23uGobCBLEHA.1264@TK2MSFTNGP12.phx.gbl... > >> I'm interested in what you're implying here - care to elaborate? << > > The regulars are cringing now :) > > A table is a set of things or relationships of the same kind. A key is > a subset of attributes in the entities being modeled which uniquely > identify each element of the set of entities. It is part of > > An auto-numbering is based on a PHYSICAL location or state inside a > machine. It has absolutely nothing to do with the data model or the > reality of the data. > > A key should be verifiable within itself. That means that when I see a > particular kind of identifier, I ought to know if it is syntactically > correct. For example, I know that ISBN 0-486-60028-9 has the correct > number of digits and that the check digit is correct for a proper > International Standard Book Number. Later on I can find out that it > identifies the Dover Books edition of AN INVESTIGATION OF THE LAWS OF > THOUGHT by George Boole. > > An identifier should have repeatable verification against the reality > that you are trying to capture in your data model. If I put the same > data into another databse, do I get the same auto-increment number? > Nope! > > Exactly what verification means can be a bit fuzzy. At one extreme, > prison inmates are moved by taking their fingerprints at control points > and courts want DNA evidence for convictions. At the other end of the > spectrum, retail stores will accept your check on the assumption that > you look like your driver's license photograph. > > What you are doing is faking a network database using IDENTITY for > pointers, instead of creating an RDBMS. > > Let me go ahead one more step and play Q&A with the direction I think > you are going: > > Q: Couldn't a compound key become very long? > > A1: So what? This is the 2000's century and we have much better > computers than we did in the 1950's when key size was a real physical > issue. What is funny to me is the number of idiots who replace a > natural two or three integer compound key with a huge GUID that no human > being or other system can possibly understand because they think it will > be faster and easy to program. > > A2: This is an implementation problem that the SQL engine can handle. > For example, Teradata is an SQL designed for VLDB apps that uses hashing > instead of B-tree or other indexes. They guarantee that no search > requires more than two probes, no matter how large the database. A tree > index requires more and more probes as the size of the database > increases. > > A3: A long key is not always a bad thing fro performance. For example, > if I use (city, state) as my key, I get a free index on just (city). I > can also add extra columns to the key to make it a super-key when such a > super-key gives me a covering index (i.e. an index which contains all of > the columns required for a query, so that the base table does not have > to be accessed at all). > > >> Do you then advocate never using an Identity attribute? Or is it > acceptable (in the relational model) to have an Identity attribute to > use as a handle to the row, and for attributes in other tables to use as > the target for a foreign key? << > > A handle to the row? Oh, you mean faking a sequential file's positional > record number, so I can reference the physical storage location? Sure, > if I want to lose all the advantages of an abstract data model, SQL set > oriented programming, carry extra data and destroy the portability of > code! > > More and more programmers who have absolutely no database training are > being told to design a database. They are using GUIDs, IDENTITY, ROWID > and other proprietary auto-numbering "features" in SQL products to > imitate either a record number (sequential file system mindset) or OID > (OO mindset) since they don't know anything else. > > Experienced database designers tend toward intelligent keys they find in > industry standard codes, such as UPC, VIN, GTIN, ISBN, etc. They know > that they need to verify the data against the reality they are modeling. > A trusted external source is a good thing to have. > > The IDENTITY column is a holdover from the early programming languages > which were very close to the hardware. For example, the fields (not > columns; big difference) in a COBOL or FORTRAN program were assumed to > be physically located in main storage in the order they were declared in > the program. The languages have constructs using that model -- logical > and physical implementations are practically one! The data has meaning > BECAUSE of the program reading it (i.e. the same bits could be a > character in one program and be an integer in another). > > The early SQLs were based on existing file systems. The data was kept > in physically contiguous disk pages, in physically contiguous rows, made > up of physically contiguous columns. In short, just like a deck of > punch cards or a magnetic tape. Most programmer still carry that mental > model, which is why I keep doing that rant about file vs. table, row vs. > record and column vs. field. > > But physically contiguous storage is only one way of building a > relational database and it is not the best one. The basic idea of a > relational database is that user is not supposed to know *how* or > *where* things are stored at all, much less write code that depends on > the particular physical representation in a particular release of a > particular product on particular hardware at a particular time. > > One of the biggest errors is the IDENTITY column (actually property, not > a column at all) in the Sybase/SQL Server family. People actually > program with this "feature" and even use it as the primary key for the > table! Now, let's go into painful details as to why this thing is bad. > > The first practical consideration is that IDENTITY is proprietary and > non-portable, so you know that you will have maintenance problems when > you change releases or port your system to other products. Newbies > actually think they will never port code! Perhaps they only work for > companies that are failing and will be gone. Perhaps their code is such > crap nobody else want their application. > > But let's look at the logical problems. First try to create a table > with two columns and try to make them both IDENTITY. If you cannot > declare more than one column to be of a certain data type, then that > thing is not a datatype at all, by definition. It is a property which > belongs to the PHYSICAL table, not the LOGICAL data in the table. > > Next, create a table with one column and make it an IDENTITY. Now try
[quoted text, click to view] Alan Howard wrote: > > Do you guys consider these sorts of things? > > Alan > >
Personally, I don't like the idea of having a identity key as a standard simply to be consistent across tables. I *try* to use the data for a primary key but when I find that the table requires more than about 3 fields to indicate a unique row, I start thinking about using a surrogate key. However, the truth is, in my real world experience, I find that I have to use a surrogate key more often than not. I'll admit though, sometimes I do it because it is simply much easier given the data, sometimes I honestly have no other alternative and recently it has been become a lot of my work is creating fact tables for OLAP cubes which basically throws normalization out the window anyway. ;) Zach
[quoted text, click to view] "Zach Wells" <no_zwells_spam@ain1.com> wrote in message news:%232X%23XpHLEHA.3052@TK2MSFTNGP12.phx.gbl... > Aaron Bertrand [MVP] wrote: > > > My POV is documented here, FWIW. > > http://www.aspfaq.com/2504 > > > > Good read and not out of line with my thinking. ;) While it's not supposed to be a contest, I'm going to side with Zach and Aaron on this. Perhaps I'm just a pragmatist at heart but while Joe's arguments are compelling in theory, and just a wee bit reminiscent of University texts, time and time again I find it easier/more appropriate/more consistent to use IDENTITY cols for artificial keys in my database. As Aaron writes in his article, I think it's more important to weigh up the pros and cons of each approach/philosophy and then to make an informed decision. If there was only the one best way of doing things then this systems development lark would get a bit boring :) As an aside, I'm currently working for a client that is implementing an enterprise HRIS. Over 1,000 tables and no relationships, indexes or constraints defined at all (the database includes a dictionary that the application uses to derive the table structure). No primary keys defined at the database level and because we've had a few migration issues you often find yourself in a position of not being able to address an individual row for update or delete. Artificial and natural values are used for the 'keys' in various tables, and unique values like EMP_NO are defined by the application and appear in many, many tables right throughout the structure. This particular 'key' is artificial and forms part of the compound key of another 255 tables making maintenance a major hassle. It's interesting to work on a project like this because, for me, it provides a widely different perspective. Alan
To talk implementation for a second, has anyone come up with an in-house best-practice policy governing the definition of key columns? If I decided that every entity-derived table in our model should have an artificial key column (IDENTITY), should purely associative tables (those implementing m:n relationships) just use a compound key defined over the two contributing foreign keys, or should this table also have it's own artificial key, with a unique index defined over the two foreign key columns? I think that there's a certain consistency in creating ALL tables with an artificial key, regardless of their use, although I suppose there would be a performance penalty (and a bit of a beating in here). If, in the future, the domain was extended and you wanted to add attribute columns to the associative table, it would still match the design of other entity tables in your database, with minimal changes to your schema (and other bits). Do you guys consider these sorts of things? Alan [quoted text, click to view] "Zach Wells" <no_zwells_spam@ain1.com> wrote in message news:%232X%23XpHLEHA.3052@TK2MSFTNGP12.phx.gbl... > Aaron Bertrand [MVP] wrote: > > > My POV is documented here, FWIW. > > http://www.aspfaq.com/2504 > > > > Good read and not out of line with my thinking. ;) > > Zach
Zach, [quoted text, click to view] >> An email address is a horrible candidate as a primary key. <<
There are no universal rules regarding how a specific attribute fits as a primary key at the logical level. It exclusively depends on the business model (conceptual schema) which is being represented in a database. The only formal & general considerations for a primary key in relational databases are stability, simplicity, familiarity & minimality. -- Anith
[quoted text, click to view] Rickard Axne wrote: > Zach Wells wrote: > >> I understand what you're saying now, but I still don't see how this >> solves the problem that we're discussing. What would your natural >> primary key be in the persons table given all the parameters >> previously mentioned in the preceding threads? > > > [Person] is identified by email address? Depends ofcourse. You might > have other identifying facts about your contacts in *your* business. > > Rickard
An email address is a horrible candidate as a primary key.
[quoted text, click to view] Rickard Axne wrote: >>> [Person] is identified by email address? Depends ofcourse. You might >>> have other identifying facts about your contacts in *your* business. >> >> >> An email address is a horrible candidate as a primary key. > > > ..Then use some other identifying fact about your contact :) > > Also, please explain why its always, regardless of any aspect in the > universe, is horrible candidate key? And please don't come dragging with > "ints vs strings" or any other boring excuse like "join performance". > Lets keep this on a higher level and deal with implementation details > later?
Keep in mind, we're discussing what a good primary key is for a table that is used to house people. a) It can change for people quite often b) People have can multiple email addresses c) email addresses can be reissued by isps d) prone to data entry errors e) Not everyone has one (in fact, i'd guess that well over half of the world's population doesn't have one) There's the first few off the top of my head.
[quoted text, click to view] Anith Sen wrote: > Zach, > > >>>An email address is a horrible candidate as a primary key. << > > > There are no universal rules regarding how a specific attribute fits as a > primary key at the logical level. It exclusively depends on the business > model (conceptual schema) which is being represented in a database. > > The only formal & general considerations for a primary key in relational > databases are stability, simplicity, familiarity & minimality. >
You're missing the point I'm trying to make. The whole thread started over the age-old discussion of using natural primary keys vs using generic keys. Joe (and I assume others) seem to take the stance that there is never a reason in a "properly" designed database to need a generic key. I say there is and as an example I use a VERY common type of data, a Person. I'm asking the question: What is a good, valid, natural key for people? Email is not one. It may be in some very specific cases but it definitely is not a good key in general. Neither is SSN. I'm claiming that there really isn't a good natural key that you can use as a primary key when dealing with people and that using a generic key is not only acceptable, but required.
[quoted text, click to view] > Keep in mind, we're discussing what a good primary key is for a table that > is used to house people. > > a) It can change for people quite often > b) People have can multiple email addresses > c) email addresses can be reissued by isps > d) prone to data entry errors > e) Not everyone has one (in fact, i'd guess that well over half of the > world's population doesn't have one)
f) many e-mail addresses are shared by multiple people, e.g. a household (Every natural key is affected by d), by the way.)
[quoted text, click to view] > "ints vs strings" or any other boring excuse like "join performance". Lets > keep this on a higher level and deal with implementation details later?
If you're building a table with 500 users, sure you can ignore those things. If you're dealing with millions of rows, data warehousing, etc., you're making a very big mistake not considering them. -- Aaron Bertrand SQL Server MVP http://www.aspfaq.com/
[quoted text, click to view] >> I say there is and as an example I use a VERY common type of data, a
Person. << That is where you faulted. Simply proclaiming an entity name without the pertinent business model will not gather much meat for you. An entity type namely "Person" in different business contexts impart different meanings, associate different rules and have different external predicates. Thus, declaring the invalidity of specific attribute like email address as a key without contextual details, as this example, is questionable. [quoted text, click to view] >> I'm asking the question: What is a good, valid, natural key for people?
<< A better one would be: What is a good key for an entity type? The answer: It depends on the business model where the entity type exists. [quoted text, click to view] >> I'm claiming that there really isn't a good natural key that you can use
as a primary key when dealing with people and that using a generic key is not only acceptable, but required. << Even with the commonly understood meaning of "natural" keys, none of your arguments go beyond the key selection criteria in my last post. The point is, any attribute is "natural" or "artificial" only within the contexts where it is being discoursed & so, being obsessed about attribute being "natural" or otherwise is mostly meaningless at the logical level. Based on the business model & data requireents, a competent designer can admit another attribute as a primary key or dismiss an existing one based on the criteria of stability, simplicity, familiarity & minimality. -- Anith
[quoted text, click to view] "Zach Wells" <no_zwells_spam@ain1.com> wrote in message news:#iFhbOULEHA.3292@TK2MSFTNGP11.phx.gbl... > Anith Sen wrote: > > >>>I say there is and as an example I use a VERY common type of data, a > > > > Person. << > > > > That is where you faulted. Simply proclaiming an entity name without the > > pertinent business model will not gather much meat for you. An entity type > > namely "Person" in different business contexts impart different meanings, > > associate different rules and have different external predicates. Thus, > > declaring the invalidity of specific attribute like email address as a key > > without contextual details, as this example, is questionable. > > That all sounds nice and pretty but it doesn't change what I've said. I > don't care what the business model is, in the vast majority of the cases > out there, a table that is used to house primary people data (name, dob, > etc), there is nothing inherent to the data that is a good candidate for > a primary key. I'm still waiting for someone to refute me on that. All > I'm hearing is more *theory*.
just use their Ashcroft # ...
[quoted text, click to view] Anith Sen wrote: >>>I say there is and as an example I use a VERY common type of data, a > > Person. << > > That is where you faulted. Simply proclaiming an entity name without the > pertinent business model will not gather much meat for you. An entity type > namely "Person" in different business contexts impart different meanings, > associate different rules and have different external predicates. Thus, > declaring the invalidity of specific attribute like email address as a key > without contextual details, as this example, is questionable.
That all sounds nice and pretty but it doesn't change what I've said. I don't care what the business model is, in the vast majority of the cases out there, a table that is used to house primary people data (name, dob, etc), there is nothing inherent to the data that is a good candidate for a primary key. I'm still waiting for someone to refute me on that. All I'm hearing is more *theory*.
hmm, I find myself agreeing with everyone on this one ... email qua email is horrible pk period. [for the reasons Zach posted] email when used as user id is a fine pk - however, it is no longer simply an email address. and if the model (data or business) uses email as a primary identifier. [quoted text, click to view] "Rickard Axne" <raxne@hotmail.com> wrote in message news:uF%23Qt5SLEHA.3704@TK2MSFTNGP11.phx.gbl... > >> [Person] is identified by email address? Depends ofcourse. You might > >> have other identifying facts about your contacts in *your* business. > > > > An email address is a horrible candidate as a primary key. > > ..Then use some other identifying fact about your contact :) > > Also, please explain why its always, regardless of any aspect in the > universe, is horrible candidate key? And please don't come dragging with > "ints vs strings" or any other boring excuse like "join performance". > Lets keep this on a higher level and deal with implementation details
later?
stupid ctrl+enter and ffs... hmm, I find myself agreeing a little with everyone on this one ... certain concepts are very difficult to model generically - like a person - because the attributes about that concept can't always be fit into a good pk without factoring in the context. e.g., for a person, generically, -- email qua email is horrible pk period. [for the reasons Zach and Aaron posted it wouldn't be a normalized attribute - i haven't seen a model yet that can show that email qua email is dependent on a person's other attributes or vice versa] -- email qua user id [like in some website guestlists] is a fine pk - however, it is no longer simply an email address. and if the model (data or business) uses email as a primary identifier, is that correct? [e.g., the website guestlist - probably a good pk] however, i have found that attempting to model concepts generically can help identify potentially incorrect or inefficient business processes/models. if a business process/model is deemed off, then it's best to fix it prior to modeling it in data. if there's doubt, then at least a generic/flexible model can be put into place until further determination is made. i have also found that, unfortunately, modeling based solely on context for some common concepts, like a person, can lead to root canals when that business model changes [although root canals are billable <s>]. [quoted text, click to view] > "Rickard Axne" <raxne@hotmail.com> wrote in message > news:uF%23Qt5SLEHA.3704@TK2MSFTNGP11.phx.gbl... > > >> [Person] is identified by email address? Depends ofcourse. You might > > >> have other identifying facts about your contacts in *your* business. > > > > > > An email address is a horrible candidate as a primary key. > > > > ..Then use some other identifying fact about your contact :) > > > > Also, please explain why its always, regardless of any aspect in the > > universe, is horrible candidate key? And please don't come dragging with > > "ints vs strings" or any other boring excuse like "join performance". > > Lets keep this on a higher level and deal with implementation details > later? > >
[quoted text, click to view] Rickard Axne wrote: > Zach Wells wrote: > >> I'm still waiting for someone to refute me on that. All I'm hearing is >> more *theory*. > > > We are refuting "email is a terrible candidate key (period)"
Oh you are? I'm sorry, I must have missed the post where you've given the first good reason to have an email address be a primary key (you keep changing it to "candidate key", which isn't what this discussion is about btw). Could you repost where you refuted that an email address is a bad primary key? I'd like to see why you think it is a *good* primary key as well. More importantly, all this is really a tangent to the original point. I honestly think I've made my point regarding the natural key vs generic key topic so unless something worth while is posted I'm going to j |