sql server programming:
I have two tables, Table A and Table B. Table B references the identities of records in Table A. I'd like to duplicate the data in Table A and Table B, such that the new data in Table B points to the new data in Table A. In pseudo-code ... INSERT TableA SELECT TableA.EverythingExceptIdentities FROM TableA INSERT TableB SELECT TableA NewIdentities, TableB.OtherColumns FROM ??? I can do this easily with cursors, but I always like to avoid cursors (more challenging) but I don't see a way to do it without one. Any ideas? Thanks! Michael Carr
You are correct. Unless you have another column or a set of columns, other than the identity column, which can uniquely identify the rows inserted, you cannot accomplish this without looping through one row at a time. -- -- Anith ( Please reply to newsgroups only )
Michael, Why don't you show us your cursor solution? I'm having trouble understanding what it is you want help with. Andrés Taylor [quoted text, click to view] "Michael Carr" <mcarr@umich.edu> wrote in message news:udMywDGlDHA.1084@tk2msftngp13.phx.gbl... > I have two tables, Table A and Table B. Table B references the identities of > records in Table A. I'd like to duplicate the data in Table A and Table B, > such that the new data in Table B points to the new data in Table A. > > In pseudo-code ... > > INSERT TableA > SELECT TableA.EverythingExceptIdentities FROM TableA > > INSERT TableB > SELECT TableA NewIdentities, TableB.OtherColumns FROM ??? > > I can do this easily with cursors, but I always like to avoid cursors (more > challenging) but I don't see a way to do it without one. Any ideas? > > Thanks! > Michael Carr > >
Michael, It's not clear exactly what you want to do, but is a trigger something to consider? If you give a script with a specific example, maybe you'll get some ideas that are more efficient than a cursor. Using scope_identity() seems like it might be an alternative, also, followed by selecting identity values from A >= a certain value. Also, couldn't cursors and scope_identity() both fail if there is concurrency? How will you iterate through the values only you inserted? SK [quoted text, click to view] Michael Carr wrote: >I have two tables, Table A and Table B. Table B references the identities of >records in Table A. I'd like to duplicate the data in Table A and Table B, >such that the new data in Table B points to the new data in Table A. > >In pseudo-code ... > >INSERT TableA > SELECT TableA.EverythingExceptIdentities FROM TableA > >INSERT TableB > SELECT TableA NewIdentities, TableB.OtherColumns FROM ??? > >I can do this easily with cursors, but I always like to avoid cursors (more >challenging) but I don't see a way to do it without one. Any ideas? > > Thanks! > Michael Carr > > > >
Martin, Great approach! Thanks! Michael Carr [quoted text, click to view] "Martin Lingl" <martinlingl@chello.at> wrote in message news:OGv$32HlDHA.2272@tk2msftngp13.phx.gbl... > (inspired by Anith's post) Let the new table A have one more column than the > original and use this column to store the original identity value. Then you > have a cross reference between old and new id's in order to replace the > foreign key values when you copy the data of table B. Afterwards you can > drop this column.
Here is my proposed cursor solution... CREATE TABLE Foo ( FooID IDENTITY PRIMARY KEY, SomeData INT ) CREATE TABLE Bar ( FooID PRIMARY KEY REFERENCES Foo(FooID), SomeData INT) < ... fill with data here ... > -- Duplicate data in Foo one row at a time DECLARE Foo_Cursor CURSOR FOR SELECT * FROM Foo OPEN Foo_Cursor FETCH NEXT FROM Foo_Cursor WHILE @@FETCH_STATUS = 0 BEGIN INSERT Foo (SomeData) VALUES Foo_Cursor.SomeData INSERT Bar (FooID, SomeData) VALUES (SCOPE_IDENTITY(), (SELECT SomeData FROM Bar WHERE FooID = Foo_Cursor.FooID)) FETCH NEXT FROM Foo_Cursor END CLOSE Foo_Cursor DEALLOCATE Foo_Cursor It seems this solution will be very inefficient because I am executing a SELECT for every row of Foo that I am copying. All this talk about FooID is making me hungry, so time for breakfast! (sorry for that horrible joke but I couldn't resist). Thanks, Michael Carr [quoted text, click to view] "Andres Taylor" <andres@rotselleri.com> wrote in message news:OMuyF7GlDHA.1800@TK2MSFTNGP10.phx.gbl... > Michael, > > Why don't you show us your cursor solution? I'm having trouble understanding > what it is you want help with.
Did it ever occur to you that your real problem is using IDENTITY instead of a real key? One non-relational "solution" cascades into another, and eventually, you have reduced an SQL database back into a bad 1950's sequential file implementation. --CELKO-- =========================== Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, datatypes, etc. in your schema are. *** Sent via Developersdex http://www.developersdex.com ***
(inspired by Anith's post) Let the new table A have one more column than the original and use this column to store the original identity value. Then you have a cross reference between old and new id's in order to replace the foreign key values when you copy the data of table B. Afterwards you can drop this column. Martin [quoted text, click to view] "Anith Sen" <anith@bizdatasolutions.com> wrote in message news:niLjb.28$Uz6.20@newsread1.news.atl.earthlink.net... > You are correct. Unless you have another column or a set of columns, other > than the identity column, which can uniquely identify the rows inserted, you > cannot accomplish this without looping through one row at a time. > > -- > -- Anith > ( Please reply to newsgroups only ) > >
Anith's solution is certainly the simplest, but it requires modifying the schema. Here is a solution using only a table variable (or temp table). -- Create tables & data CREATE TABLE TableA (RecordID int identity(1,1) primary key, a int) CREATE TABLE TableB (RecordID int references TableA (RecordID), b varchar(10)) INSERT TableA SELECT 5 UNION SELECT 7 INSERT TableB SELECT 1, 'Five' UNION ALL SELECT 2, 'Seven' UNION ALL SELECT 1, 'Five' SELECT * FROM TableA SELECT * FROM TableB BEGIN TRANSACTION -- Number all the rows in TableA DECLARE @xref table (RowNum int identity(1,1), RecordID int) INSERT @xref (RecordID) SELECT RecordID FROM TableA ORDER BY RecordID -- Populate TableA with duplicates DECLARE @seed int SET @seed = IDENT_CURRENT('TableA') INSERT TableA (a) SELECT a FROM TableA ORDER BY RecordID -- Populate TableB with duplicates pointing to new rows in TableA INSERT TableB (RecordID, b) SELECT (@seed + RowNum) as RecordID, TableB.b FROM TableB INNER JOIN @xref as x ON x.RecordID = TableB.RecordID COMMIT TRANSACTION SELECT * FROM TableA SELECT * FROM TableB DROP TABLE TableB DROP TABLE TableA [quoted text, click to view] "Anith Sen" <anith@bizdatasolutions.com> wrote in message news:niLjb.28$Uz6.20@newsread1.news.atl.earthlink.net... > You are correct. Unless you have another column or a set of columns, other > than the identity column, which can uniquely identify the rows inserted, you > cannot accomplish this without looping through one row at a time. > > -- > -- Anith > ( Please reply to newsgroups only ) > >
Joe -- I have been wondering about IDENTITY lately along the lines of what you are saying. In some test implementations, I came up with a question regarding using non-IDENTITY keys: Say for example that I need four columns to uniquely identify a row of a given table. I therefore decide to use these four columns as my primary key. When I create rows in other tables that need to reference this one, it seems I would need to duplicate those four columns in order to uniquely identify which row in the main table I want to refer to. On the other hand, if I use IDENTITY to identify rows in the main table, I need to carry only one integer value to uniquely identify a given row. You see my point? For example, I have a table that lists reservation requests for a lecture hall. In order to uniquely identify a row of that table, I need to specify: 1. The date and time of the requested reservation, 2. The username of the person requesting it, 3. The organization of the person requesting it (there are multiple organizations in the same database), 4. The room number, 5. The building name. So, if I were to reference a given reservation in another table it seems I would have to duplicate five pieces of data...? Do you have any good web references which discuss how to get away from IDENTITY primary keys? Thank you, Michael Carr [quoted text, click to view] "Joe Celko" <joe.celko@northface.edu> wrote in message news:epRnxhMlDHA.1408@TK2MSFTNGP11.phx.gbl... > Did it ever occur to you that your real problem is using IDENTITY > instead of a real key? One non-relational "solution" cascades into > another, and eventually, you have reduced an SQL database back into a > bad 1950's sequential file implementation.
[quoted text, click to view] >> Say for example that I need four columns to uniquely identify a row
of a given table. I therefore decide to use these four columns as my primary key. << And even if you did not make them the PRIMARY KEY, you must-- repeat must -- make them UNIQUE to enforce that business rule. Adding another key does not remove that requirement. Adding another key that cannot be verified can screw up the data integrity. [quoted text, click to view] >> For example, I have a table that lists reservation requests for a
lecture hall. In order to uniquely identify a row of that table, I need to specify: 1. The date and time of the requested reservation, 2. The username of the person requesting it, 3. The organization of the person requesting it (there are multiple organizations in the same database), 4. The room number, 5. The building name. << Bad example; an event is bound by time and place while the people belong to an organization. [quoted text, click to view] >> So, if I were to reference a given reservation in another table it
seems I would have to duplicate five pieces of data...? << No, just the time and place, which would be unique -- unless you really wanted to book the Miltant Vegans and the Texas Cattleman's association together :) Another choice is to set up an Events calendar table and give each event a unique, then let the other tables reference this events calendar. [quoted text, click to view] >> Do you have any good web references which discuss how to get away
from IDENTITY primary keys [sic]? << Anything written on data modeling and RDBMS. The IDENTITY property is a proprietary nightmare, not the normal behavior of a DB modeler in this situation. An IDENTITY by definition cannot ever be a key, since it is not an attribute in the reality being modeled. --CELKO-- =========================== Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, datatypes, etc. in your schema are. *** Sent via Developersdex http://www.developersdex.com ***
Why in the world can't an arbitrary, but unique, number (or character string) be used as a key? In most situations using the natural key as the primary key makes sense and a surrogate key merely adds another column with no benefit. However, in situations where the natural key is composed of multiple columns, it might make perfect sense to use an IDENTITY column as the primary key to avoid having to propagate the entire composite natural key to any and all "child tables." Granted, the natural key needs to be defined and enforced as such, but in these cases it becomes simpler and more efficient to join tables based on a single surrogate key. In fact, since it can be argued that any key, or identifier, is somewhat arbitrary and unnatural, the definition of a "natural key" and a "surrogate key" comes down to timing. For example, an employee table might not even have a valid key that is guaranteed unique without some kind of artificially assigned Employee#. A natural key is one that was defined prior to the database existing, a surrogate key is defined after, or as part of the database system. Neither can be said to be more or less desirable across the board. For you to say that an IDENTITY column can never function properly as a key, is an error. Now your objection to IDENTITY columns in regards to it not being standard SQL is another matter, which is also debatable in my opinion, but has no bearing on the logical design and assignment of keys per se. [quoted text, click to view] "Joe Celko" <joe.celko@northface.edu> wrote in message news:OfedFjzlDHA.1728@TK2MSFTNGP11.phx.gbl... > >> Say for example that I need four columns to uniquely identify a row > of a given table. I therefore decide to use these four columns as my > primary key. << > > And even if you did not make them the PRIMARY KEY, you must-- repeat > must -- make them UNIQUE to enforce that business rule. Adding another > key does not remove that requirement. > > Adding another key that cannot be verified can screw up the data > integrity. > > >> For example, I have a table that lists reservation requests for a > lecture hall. In order to uniquely identify a row of that table, I need > to specify: 1. The date and time of the requested reservation, 2. The > username of the person requesting it, 3. The organization of the person > requesting it (there are multiple organizations in the same database), > 4. The room number, 5. The building name. << > > Bad example; an event is bound by time and place while the people belong > to an organization. > > >> So, if I were to reference a given reservation in another table it > seems I would have to duplicate five pieces of data...? << > > No, just the time and place, which would be unique -- unless you really > wanted to book the Miltant Vegans and the Texas Cattleman's association > together :) > > Another choice is to set up an Events calendar table and give each event > a unique, then let the other tables reference this events calendar. > > >> Do you have any good web references which discuss how to get away > from IDENTITY primary keys [sic]? << > > Anything written on data modeling and RDBMS. The IDENTITY property is a > proprietary nightmare, not the normal behavior of a DB modeler in this > situation. An IDENTITY by definition cannot ever be a key, since it is > not an attribute in the reality being modeled. > > --CELKO-- > =========================== > Please post DDL, so that people do not have to guess what the keys, > constraints, Declarative Referential Integrity, datatypes, etc. in your > schema are. > > *** Sent via Developersdex http://www.developersdex.com *** > Don't just participate in USENET...get rewarded for it!
Don, Are you familiar with the notion of Physical Data Independence with regard to relational databases? -- - Anith ( Please reply to newsgroups only )
Are we here again :) -- ---------------------------------------------------------------------------- ----------- Louis Davidson (drsql@hotmail.com) Compass Technology Management Pro SQL Server 2000 Database Design http://www.apress.com/book/bookDisplay.html?bID=266 Note: Please reply to the newsgroups only unless you are interested in consulting services. All other replies will be ignored :) [quoted text, click to view] "Anith Sen" <anith@bizdatasolutions.com> wrote in message news:ebhZ$g1lDHA.2528@TK2MSFTNGP10.phx.gbl... > Don, > > Are you familiar with the notion of Physical Data Independence with regard > to relational databases? > > -- > - Anith > ( Please reply to newsgroups only ) > >
:-) Of course, we are ! "Row, row, row your boat Gently down the stream. Merrily, merrily, merrily, merrily, Life is but a dream." -- - Anith ( Please reply to newsgroups only )
I am familiar with the principle of data independence but fail to see what this has to do with it. The assignment of "surrogate keys" is a logical operation which is completely separate from the SQL Server IDENTITY function. The whole idea of data independence revolves around sharing data with multiple applications and insulating the application(s) from many changes at the physical level. It has nothing to do with DBMS platform independence. I am also familiar with Celko's instance that anything but standard SQL is bad, and disagree with it. Your data must reside in some implementation of DBMS, as such there will inevitably be some non standard functionality implemented. What is wrong with taking advantage of that functionality when appropriate? We use SQL Server, DB2, and Oracle and use each one based on their respective merits and vendor specifications. Using only those standard SQL functions that all DBMS's have in common is also known as the lowest common denominator at which your choices become extremely limited. How often do you move a database from one platform to another? I have done it very rarely and in my experience IDENTITY columns are the least of your worries. Besides, why would you even bother to move between platforms if it weren't to take advantage of some additional, maybe even unique, functionality that your old platform lacked? One can argue that the problem is that no DBMS vendor FULLY supports the SQL Standard and that is probably true. However, it can also be argued that the SQL standard itself is a standard that is not worth fully supporting since it is an incomplete and flawed implementation of the relational model of data. Ironically, it is the area of data independence where the SQL standard falls down the hardest. [quoted text, click to view] "Anith Sen" <anith@bizdatasolutions.com> wrote in message news:ebhZ$g1lDHA.2528@TK2MSFTNGP10.phx.gbl... > Don, > > Are you familiar with the notion of Physical Data Independence with regard > to relational databases? > > -- > - Anith > ( Please reply to newsgroups only ) > >
True, SQL Standards have little or nothing to do with the basic principle of Physical Data Independence; however that is not what I was trying to put across. [quoted text, click to view] >>The assignment of "surrogate keys" is a logical operation which is
completely separate from the SQL Server IDENTITY function. << No disagreement there. [quoted text, click to view] >>How often do you move a database from one platform to another? I have
done it very rarely and in my experience IDENTITY columns are the least of your worries. << Though rarely do I port across DBMS platforms, quite often I move the data across the servers, across databases and across tables. And my experience with identity column is, on the contrary, one of the major worries. FWIW, I know, generally using personal experiences to back up biases for or against using a widely used product-specific feature has little credibility since it can be highly subjective. [quoted text, click to view] >>Besides, why would you even bother to move between platforms if it weren't
to take advantage of some additional, maybe even unique, functionality that your old platform lacked? << There are many who simply do not consider functionality like identity column that significant. Many among us have better yardsticks other than popularity & performance heuristics to decide if a DBMS feature is advantageous or not. This is not about whether to use of surrogate keys or not, but whether an identity column value can qualify as a valid surrogate or not. The point is, valid logical values in a relational database are independent of the physical implementation concerns like device/ OS/ hardware on which the database is implemented. An identity column value in SQL Server is strictly a tuple-identifier. It is not a logical value & is a direct expose` of the physical model of a database. This is based on the fact that the same identity column values cannot be generated for a given dataset under all conditions. The identity column value generation is dependant on user sessions, transaction boundaries, etc as well as in some cases, existing indexes and access paths. Most importantly, identity columns are non-updateable, primarily due to the above reason. Similar shortcomings like device/ OS dependence etc. are characteristics of GUID values as well. Physical data independence dictates the insulation of the logical model of a database from its physical implementation aspects. As such values generated by identity columns, GUIDs etc are tied to the physical model, thereby do not qualify themselves as logical values. And, by definition, surrogates key values are logical, and thus identity column and GUID values are not valid surrogates, contrary to popular misconception. ( For more details on the distinction between tuple identifiers & surrogate keys, please refer to Intro to Database Systems by C J Date ). [quoted text, click to view] >> One can argue that the problem is that no DBMS vendor FULLY supports the
SQL Standard and that is probably true. However, it can also be argued that the SQL standard itself is a standard that is not worth fully supporting since it is an incomplete and flawed implementation of the relational model of data. << Well said. I couldn't agree more. -- - Anith ( Please reply to newsgroups only )
Thanks as always for your thoughtful response. [quoted text, click to view] > There are many who simply do not consider functionality like identity column > that significant. Many among us have better yardsticks other than popularity > & performance heuristics to decide if a DBMS feature is advantageous or
not. I'm not suggesting that anyone run out and buy SQL Server to get the IDENTITY functionality. However, there have been several instances, in my experience, where IDENTITY has provided more robustness and better performance to our systems. [quoted text, click to view] > > This is not about whether to use of surrogate keys or not, but whether an > identity column value can qualify as a valid surrogate or not. > > The point is, valid logical values in a relational database are independent > of the physical implementation concerns like device/ OS/ hardware on which > the database is implemented. An identity column value in SQL Server is > strictly a tuple-identifier. It is not a logical value & is a direct expose` > of the physical model of a database. This is based on the fact that the same > identity column values cannot be generated for a given dataset under all > conditions. The identity column value generation is dependant on user > sessions, transaction boundaries, etc as well as in some cases, existing > indexes and access paths. Most importantly, identity columns are > non-updateable, primarily due to the above reason. Similar shortcomings like > device/ OS dependence etc. are characteristics of GUID values as well.
I have to disagree with you on this point. An IDENTITY column does not expose the physical storage internals to the user, it is merely a mechanism to assign a system-generated integer value to an attribute of a tuple. C.J. Date states that "Ideally, surrogate key valies would be system-generated." IDENTITY is not however a tuple ID since it does not "point to the physical location" of a given tuple. Oracle's RowID is a tuple ID. In Oracle to approach IDENTITY functionality you have to use a SEQUENCE. Another important distinction is that a tuple ID cannot be stored as an attrubute of a tuple since it does point to the physical storage of the tuple, which can change. Tuple ID's are thus not generally exposed to the users. In the case of a surrogate key the values can and sometimes are exposed to the user in the form of Employee# for example. Microsoft's implementation of the IDENTITY function in 7.0 was certainly lacking and suffered from many of the drawbacks you state, but in 2000 you have much better control over whether the value is affected by transaction boundaries etc... But even in 7.0 you could effectively use IDENTITY to assign a unique value for the table in question. To be certain, an IDENTITY column can be misused to impart a wierd kind of meaningful ordering to the rows in a table. This should, of course, be avoided, but aside from this, I still don't see any problems with IDENTITY as a primary key. [quoted text, click to view] > Physical data independence dictates the insulation of the logical model of a > database from its physical implementation aspects. As such values generated > by identity columns, GUIDs etc are tied to the physical model, thereby do > not qualify themselves as logical values. And, by definition, surrogates key > values are logical, and thus identity column and GUID values are not valid > surrogates, contrary to popular misconception. ( For more details on the > distinction between tuple identifiers & surrogate keys, please refer to > Intro to Database Systems by C J Date ).
Again, these values are not tied to the physical model in any way because they are not physical location pointers. They are just system-generated values for a given attribute. It is easy to take the values generated by either of these functions and move them to another platform. The only challenge is to replicate the IDENTITY functionality on Oracle, DB2 etc... And as I said, in my experience, it isn't a major issue. Your mileage may vary! As for Intro to Database Systems, it is THE reference book for anyone who is serious about being a data management professional rather than being merely a knob-turner and button-pusher for a given platform. I wish more people would read and understand it! And by the way, I hope I didn't come across as a Celko basher. I respect much of his work and have enjoyed several of his books, but he isn't perfect and I think he is wrong in this instance.
The reference to identity column as a tuple-id in strict relational sense may a bit confusing in the sense that tuple-ids are "internal", however the distinction I would like to make here is that tuple-id identify the tuples while surrogate keys identify entities. There is a distinction between the reference to surrogates in Codd's RM v2 and Date Intro book. Codd (and this is Joe Celko's stance too, I believe) suggests surrogates are not exposed to the user while Date says tuple-ids are not exposed to the user. In a RDBMS providing sufficient logical data independence, clearly this can be achieved using views, derived relations etc. so exposing a tuple-id to the user is a non-issue. [quoted text, click to view] >> Date states that "Ideally, surrogate key valies would be
system-generated." << You stopped there! He goes on to explain, whether they are system- or user- generated has nothing to do with the basic idea of surrogate keys as such. Date in fact, says, surrogates are a logical concept, and conforms to the Information Principle. In other words, in a logical model based on relational theory, a key is a logical value, surrogate or not. This is based on the explanation on Page 444 on Intro book (7th Edition). The popular DBMS products do not necessarily provide sufficient insulation between the logical model and physical implementation aspects. Due to this it is generally hard to identify whether a value represented in a SQL table is a valid attribute value or an artifact of the physical model. For simplicity, I make the distinction between them quite simply by applying the information principle and rule of set-based manipulations, whereby logical values are updateable through relational operations. While using SQL Server, one can come across difficulties in using tables with identity column keys as well (which are generally trivialized by workarounds). One typical example is the OPs problem. A trigger or a cursor could be workaround here. But why use it in the first place, if you can get it done, otherwise? This is not to disprove the use of identity columns altogether, but I find little benefit in using an identity column in a table as opposed to a valid key attribute. Generally in these newsgroups I have seen too many posts with arguments encouraging the use of identity columns which is equated to improved performance, good database design etc which is nonsense, IMO. And your post is a rare exception & of course informative, I should say. [quoted text, click to view] >> And by the way, I hope I didn't come across as a Celko basher. <<
Don't worry, he has a thicker skin than anyone would imagine, I guess :-) -- - Anith ( Please reply to newsgroups only )
Thanks again, I think that it stands to reason that a surrogate key (regardless of the mechanism used to generate it) MUST be exposed to the user (or at least potentially so.) Whereas a tuple ID cannot, and here I have assumed Date was talking about something like Oracle's RowID particularly because he mentions the case of derived tuples. The RowID would make no sense in the case of anything except a base table because it "points" to where the tuple is stored, obviously a derived tuple isn't stored anywhere. A surrogate key (again, by whatever means) still makes perfect sense when used in a derived tuple. The Information Principle basically states that all information content of the database is represented in tables and that the relationships between tables are represented by data VALUES rather than physical pointers between the pages where the rows are stored. We all know that at the physical level there are, and must be these physical pointers, but the Information Principle states that these must not be exposed to the user. Conversely, if all information in a database is represented in tables, it also follows that ONLY information should be represented by those tables. Now, you might make the case that since an arbitrary value doesn't convey any real information about the entity in question, it isn't a real attribute. But that argument applies to any surrogate key regardless of how it is generated. I guess I'm not sure whether Joe's and your objection is based on the idea of using surrogate keys, or whether it is the idea of using the IDENTITY function to generate them, or maybe both? Just in case I haven't been clear, I think that surrogate keys are acceptable under certain circumstances and that the IDENTITY function is an acceptable way to generate them, again, under appropriate circumstances. The IDENTITY function does not "know", or care about the physical storage aspects. It is just a convenient way of automatically generating unique values for a given table. Yes, there are some "issues" one needs to be aware of, but the same is true if you devise some kind of trigger or table driven mechanism to do the same thing. I am not sure what you mean by the "OPs problem", can you elaborate? Generally, in these newsgroups there is such an astonishingly poor understanding of good database design that I get depressed just reading them. I can't imagine what kind of data chaos results from some of the things I read. I am certainly not arguing that every table should have an IDENTITY column. It is just one of the tools available when the platform happens to be SQL Server, and I quite like it. It has been fun exchanging ideas with someone like yourself who obviously takes your profession seriously and has invested the time required to educate yourself to be able to speak intelligently on the subject. Thanks for your efforts and if you could expand on the OPs thing, I would appreciate it.
[quoted text, click to view] >> A surrogate key (again, by whatever means) still makes perfect sense when
used in a derived tuple. << By derived tuple, can I assume you referring to a tuple in a snapshot or a derived relation here? A surrogate key, by virtue of being a logical value, is an entity identifier, in other words, a surrogate once assigned to an entity becomes an attribute of the entity. But the problem I have is in proving an identity column value is a valid logical value in the first place :-) [quoted text, click to view] >> The Information Principle basically states that... But that argument
applies to any surrogate key regardless of how it is generated. << Exactly, that is the key here. I understand Information Principle and the rule of set-based manipulations apply to surrogate keys but not to identity columns. Here are some simple points : ***. Surrogates are logical values. IMO, Identity column values are not logical values, since the same value cannot be generated for a given row under all circumstances. These vary with several physical implementation factors, say for example, physical clustering : --#1 --create a table CREATE TABLE tbl (col CHAR(1) NOT NULL) INSERT tbl SELECT 'a' UNION SELECT 'b' UNION SELECT 'c' --add a clustered index CREATE CLUSTERED INDEX idx ON tbl(col) ALTER TABLE tbl ADD idcol INT IDENTITY -- check the data in the table SELECT * FROM tbl col idcol ---- ----------- a 1 b 2 c 3 --#2 DROP TABLE tbl CREATE TABLE tbl (col CHAR(1) NOT NULL) INSERT tbl SELECT 'a' UNION SELECT 'b' UNION SELECT 'c' CREATE CLUSTERED INDEX idx ON tbl(col DESC) -- change the cluster order ALTER TABLE tbl ADD idcol INT IDENTITY SELECT * FROM tbl col idcol ---- ----------- c 1 b 2 a 3 This illustrates the value of identity column relies on the physical clustering of the table. Now, ook at another mechanism to create a surrogate value : -- using a *logical* surrogate DROP TABLE tbl CREATE TABLE tbl (col CHAR(1) NOT NULL) INSERT tbl SELECT 'a' UNION SELECT 'b' UNION SELECT 'c' CREATE CLUSTERED INDEX idx ON tbl(col DESC) ALTER TABLE tbl ADD idcol INT UPDATE tbl SET idcol = (SELECT COUNT(*) FROM tbl t1 WHERE t1.col <= tbl.col) ; SELECT * FROM tbl col idcol ---- ----------- c 3 b 2 a 1 Irrespective how the table is physically clustered, the idcol values being logical, stays insulated. ***. Loosing identity values due to uncommitted inserts and/other anomalies DROP TABLE tbl CREATE TABLE tbl (idcol INT NOT NULL IDENTITY) INSERT tbl DEFAULT VALUES BEGIN TRANSACTION INSERT tbl DEFAULT VALUES ROLLBACK TRANSACTION INSERT tbl DEFAULT VALUES SELECT * FROM tbl idcol ----------- 1 3 You lose the value 2 due its dependence on the physical transaction boundaries. Now, if you use a logical mechanism like computed column, MAX+1 etc, this wouldn't happen. *** Identity column is a non-updateable column. A valid logical surrogate behaves exactly like any other logical value in a relational database. To me, the inability to update a value in an identity column can only be explained only if there is a direct "tie" to the physical implementation. This drawback itself discredits an identity column value from being a valid logical value. [quoted text, click to view] >> I am not sure what you mean by the "OPs problem", can you elaborate? <<
The OP's problem (serialization issue) is a common one, not related to the logical-physical discussion above, but is very common when one uses the identity column primary key. Assume we have two tables with identity column primary keys CREATE TABLE t1 ( idcol INT NOT NULL IDENTITY PRIMARY KEY, col CHAR(1) NOT NULL ); CREATE TABLE t2 ( idcol INT NOT NULL IDENTITY PRIMARY KEY, refcol INT NOT NULL REFERENCES t1(idcol)) ; Now assume, we have some data in a temp table #t with data which we need to load it into the table t1 and update the referencing table t2. SELECT * INTO #t FROM (SELECT 'D' UNION SELECT 'O' UNION SELECT 'N' UNION SELECT 'P') D The problem is to load the data in #t into t1 and t2 in one shot. This is a common scenario, esp. with huge datasets, in ETL, data staging etc. The only way you can load them is to use a cursor/loop and insert one at a time due to the serialization of inserts or have a trigger in the table t1 to insert the data into t2. -- - Anith ( Please reply to newsgroups only )
Just an addendum...By 'OP', I meant the original poster. I was not sure, if this was a source of confusion. -- - Anith ( Please reply to newsgroups only )
Anith, Next time, give the OP's social security number so there's no confusion about who you're referring to. ;) SK [quoted text, click to view] Anith Sen wrote: >Just an addendum...By 'OP', I meant the original poster. I was not sure, if >this was a source of confusion. > > >
[quoted text, click to view] > ***. Surrogates are logical values. IMO, Identity column values are not > logical values, since the same value cannot be generated for a given row > under all circumstances. These vary with several physical implementation > factors, say for example, physical clustering :...
Good examples, but given that a surrogate key is an arbitrary value anyway, does it really matter? Once the value is assigned it is no longer affected by the physical storage particulars, and it certainly does not reflect the physical storage aspect except in the most tenuous way. I can't think of a reason that this would really matter, although it does make an interesting argument. [quoted text, click to view] > ***. Loosing identity values due to uncommitted inserts and/other
anomalies I know that this really bothers some people, but I have seen only one situation where it really mattered. In sending wire transfers to the Fed, the transfers must be assigned a sequential integer and there are no gaps allowed. This obviously isn't a good case to use IDENTITY. From a more abstract viewpoint, it should not matter in the least that there are gaps in the values of an attribute since having a gap-free sequence of values implies meaningful ordering. [quoted text, click to view] > > *** Identity column is a non-updateable column. > > A valid logical surrogate behaves exactly like any other logical value in
a relational database. I've always thought that updateability is one of the reasons to use a surrogate key. Updating primary keys is always a bit different than updating other attributes. Using a surrogate key allows a bit more latitude in updating the natural key. If you do need to update your PK, an IDENTITY column isn't an option. The other example you gave is a valid one as well, and again, an IDENTITY column probably isn't a wise choice there. Thanks for clarifying what OP means. I racked my brain trying to think of what that could stand for and never would have hit on "original poster." Anyway, thanks for the interesting exchange. Even if we may not totally agree, I always learn something when I'm forced to think about how and why I do things...
Don't see what you're looking for? Try a search.
|