sql server programming:
In SS05 , is a col config'd as Identity datatype necessarily the PK ? If not , then for a table with an Identity col defined , how to set the PK in SS Mgmt Studio ?
BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER, NEVER a primary. Did you sleep thru RDBMS 101? Get the books you never read, and look up what a key is. It is to be a subset of the attributes of an entity -- this is by definition! What you are doing is confusing RDBMS with a 1950's magnetic tape file systems, where you located a record (NOT a row!! Totally different concept!) in a sequence. Just like a record number, so is the IDENTITY column. A totally non-relational PHYSICAL locator based on when a PHYSICAL record (which models but is not the samew as a row) is inserted into a table. [quoted text, click to view] >> how to set the PK in SS Mgmt Studio ? <<
Who cares?? You are not not a real SQL programmer!! You are a "mousey, mousey, click , click" non-programmer. (with a French accent) we spit on you, Video gamer! to be serious, real programmers use a text editor. They know the language they write in. Those stinking "video game tools"slow us down. And they lead us to ask questiosn like this in newsgroups where people liek me will maek fun of you.
Hi, The identity datatype column can never be primary key column. We have to define it if required ti be a primary key column by creating a primary key constraint. To Set a primary key. Select the table rigt click and choose modify. It would open the description of the table. select colum right click and select set to primary key option. This is how to do through sql server management studio. We can do this programmatically also. HTH Thanks & Regards Shaju [quoted text, click to view] "John A Grandy" wrote: > In SS05 , is a col config'd as Identity datatype necessarily the PK ? > > If not , then for a table with an Identity col defined , how to set the PK > in SS Mgmt Studio ? > >
Joe ... please remember: 40mg valium ever 4 hours, like the doc told you. I know it's real fun up in the rdbms theory ozone layer, but your cardiac health is far more important. "Mousey mousey click click" IDEs are taking over the world 'cause companies like mine incinerate any and all competition trying to write apps in pure code. If you don't like the new reality, go argue with Scott Gu , not me. So how would approach the need for a PK on the following table : USE [GUI] GO /****** Object: Table [dbo].[Tasks] Script Date: 02/14/2006 22:53:57 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[Tasks]( [TaskID] [int] IDENTITY(1,1) NOT NULL, [TaskDate] [datetime] NOT NULL, [TaskDescription] [varchar](2000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL CONSTRAINT [DF_Tasks_TaskName] DEFAULT (''), [TaskTimeInHours] [decimal](3, 1) NOT NULL CONSTRAINT [DF_Tasks_TimeInHours] DEFAULT (0) ) ON [PRIMARY] GO SET ANSI_PADDING OFF And yes I do read Ken Henderson whenever I'm on the can, like a good lad. [quoted text, click to view] "--CELKO--" <jcelko212@earthlink.net> wrote in message news:1139983626.481357.192860@g43g2000cwa.googlegroups.com... > BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER, > NEVER a primary. Did you sleep thru RDBMS 101? > > Get the books you never read, and look up what a key is. It is to be a > subset of the attributes of an entity -- this is by definition! > > What you are doing is confusing RDBMS with a 1950's magnetic tape file > systems, where you located a record (NOT a row!! Totally different > concept!) in a sequence. Just like a record number, so is the IDENTITY > column. A totally non-relational PHYSICAL locator based on when a > PHYSICAL record (which models but is not the samew as a row) is > inserted into a table. > >>> how to set the PK in SS Mgmt Studio ? << > > Who cares?? You are not not a real SQL programmer!! You are a > "mousey, mousey, click , click" non-programmer. (with a French accent) > we spit on you, Video gamer! > > to be serious, real programmers use a text editor. They know the > language they write in. Those stinking "video game tools"slow us down. > And they lead us to ask questiosn like this in newsgroups where people > liek me will maek fun of you. >
Yes, this is exactly what I attempted: I right-clicked on the Identity col and I see a list of its properties (including "Primary key") -- but all properties in the list are disabled : they're all greyed-out and can't be edited. [quoted text, click to view] "Shaju" <Shaju@discussions.microsoft.com> wrote in message news:72F8505F-73F5-4574-B894-88410CFD04B6@microsoft.com... > Hi, > > The identity datatype column can never be primary key column. We have to > define it if required ti be a primary key column by creating a primary key > constraint. > > To Set a primary key. Select the table rigt click and choose modify. It > would open the description of the table. select colum right click and > select > set to primary key option. This is how to do through sql server management > studio. > > We can do this programmatically also. > > HTH > > Thanks & Regards > Shaju > > "John A Grandy" wrote: > >> In SS05 , is a col config'd as Identity datatype necessarily the PK ? >> >> If not , then for a table with an Identity col defined , how to set the >> PK >> in SS Mgmt Studio ? >> >> >>
[quoted text, click to view] John A Grandy wrote: > "Mousey mousey click click" IDEs are taking over the world 'cause companies > like mine incinerate any and all competition trying to write apps in pure > code. If you don't like the new reality, go argue with Scott Gu , not me. >
Do not use the GUI to make changes in a production environment. Especially don't make changes to IDENTITY columns this way. The safe and professional way is to use scripts. [quoted text, click to view] > > So how would approach the need for a PK on the following table : >
How can we say since we know nothing of your data or business? If there is no other key but IDENTITY then you have done something seriously wrong. IDENTITY should never be the only key of a table. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
Ahh, then right click on the Identity col ... that did the trick. Thanks. [quoted text, click to view] "Uri Dimant" <urid@iscar.co.il> wrote in message news:%23CFIAHgMGHA.3728@tk2msftngp13.phx.gbl... > John > Right Click on the table and then Modify > > > > > > "John A Grandy" <johnagrandy-at-yahoo-dot-com> wrote in message > news:eys$b5fMGHA.3064@TK2MSFTNGP10.phx.gbl... >> Yes, this is exactly what I attempted: I right-clicked on the Identity >> col and I see a list of its properties (including "Primary key") -- but >> all properties in the list are disabled : they're all greyed-out and >> can't be edited. >> >> >> "Shaju" <Shaju@discussions.microsoft.com> wrote in message >> news:72F8505F-73F5-4574-B894-88410CFD04B6@microsoft.com... >>> Hi, >>> >>> The identity datatype column can never be primary key column. We have to >>> define it if required ti be a primary key column by creating a primary >>> key >>> constraint. >>> >>> To Set a primary key. Select the table rigt click and choose modify. It >>> would open the description of the table. select colum right click and >>> select >>> set to primary key option. This is how to do through sql server >>> management >>> studio. >>> >>> We can do this programmatically also. >>> >>> HTH >>> >>> Thanks & Regards >>> Shaju >>> >>> "John A Grandy" wrote: >>> >>>> In SS05 , is a col config'd as Identity datatype necessarily the PK ? >>>> >>>> If not , then for a table with an Identity col defined , how to set the >>>> PK >>>> in SS Mgmt Studio ? >>>> >>>> >>>> >> >> > >
[quoted text, click to view] David Portas wrote: >How can we say since we know nothing of your data or business? If there >is no other key but IDENTITY then you have done something seriously >wrong. IDENTITY should never be the only key of a table. > >
What about a conceptual table like a table of tasks in a project plan for instance? What would be other candidate keys for a "task"? (I'm not pushing any particular view, I'm just trying to think of situations where there is no "natural" key for a table.) -- *mike hodgson*
[quoted text, click to view] John A Grandy wrote: >Joe ... please remember: 40mg valium ever 4 hours, like the doc told you. I >know it's real fun up in the rdbms theory ozone layer, but your cardiac >health is far more important. > >
Bravo! Let him have it (I think Joe was harsher than normal in his post). But seriously, 40mg every 4 hours? Are you trying to kill him? I don't know anything about valium really but I suspect such a large, regular dose is likely to be worse for his cardiovascular system than just simply leaving his keyboard for a couple hours & going for a walk in the park. -- *mike hodgson*
Ease up a bit Joe. That was harsh even by your usual standards. And, from memory, the John Cleese quote from The Holy Grail was "I fart in your general direction!" but it has been a while since I last saw it so I could be mistaken. BTW, hard-core assembly programmers (not me - I hated assembly at Uni) might argue that people who don't program in machine code or assembly but use 3GLs, like C for instance, are not real programmers either. It's all a matter of perspective. -- *mike hodgson* http://sqlnerd.blogspot.com [quoted text, click to view] --CELKO-- wrote: >BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER, >NEVER a primary. Did you sleep thru RDBMS 101? > >Get the books you never read, and look up what a key is. It is to be a >subset of the attributes of an entity -- this is by definition! > >What you are doing is confusing RDBMS with a 1950's magnetic tape file >systems, where you located a record (NOT a row!! Totally different >concept!) in a sequence. Just like a record number, so is the IDENTITY >column. A totally non-relational PHYSICAL locator based on when a >PHYSICAL record (which models but is not the samew as a row) is >inserted into a table. > > > >>>how to set the PK in SS Mgmt Studio ? << >>> >>> > >Who cares?? You are not not a real SQL programmer!! You are a >"mousey, mousey, click , click" non-programmer. (with a French accent) >we spit on you, Video gamer! > >to be serious, real programmers use a text editor. They know the >language they write in. Those stinking "video game tools"slow us down. > And they lead us to ask questiosn like this in newsgroups where people >liek me will maek fun of you. > >
Hi David, What about a message board? -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1139989273.917960.17920@g14g2000cwa.googlegroups.com... > John A Grandy wrote: >> "Mousey mousey click click" IDEs are taking over the world 'cause >> companies >> like mine incinerate any and all competition trying to write apps in pure >> code. If you don't like the new reality, go argue with Scott Gu , not >> me. >> > > Do not use the GUI to make changes in a production environment. > Especially don't make changes to IDENTITY columns this way. The safe > and professional way is to use scripts. > >> >> So how would approach the need for a PK on the following table : >> > > How can we say since we know nothing of your data or business? If there > is no other key but IDENTITY then you have done something seriously > wrong. IDENTITY should never be the only key of a table. > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
John Right Click on the table and then Modify [quoted text, click to view] "John A Grandy" <johnagrandy-at-yahoo-dot-com> wrote in message news:eys$b5fMGHA.3064@TK2MSFTNGP10.phx.gbl... > Yes, this is exactly what I attempted: I right-clicked on the Identity col > and I see a list of its properties (including "Primary key") -- but all > properties in the list are disabled : they're all greyed-out and can't be > edited. > > > "Shaju" <Shaju@discussions.microsoft.com> wrote in message > news:72F8505F-73F5-4574-B894-88410CFD04B6@microsoft.com... >> Hi, >> >> The identity datatype column can never be primary key column. We have to >> define it if required ti be a primary key column by creating a primary >> key >> constraint. >> >> To Set a primary key. Select the table rigt click and choose modify. It >> would open the description of the table. select colum right click and >> select >> set to primary key option. This is how to do through sql server >> management >> studio. >> >> We can do this programmatically also. >> >> HTH >> >> Thanks & Regards >> Shaju >> >> "John A Grandy" wrote: >> >>> In SS05 , is a col config'd as Identity datatype necessarily the PK ? >>> >>> If not , then for a table with an Identity col defined , how to set the >>> PK >>> in SS Mgmt Studio ? >>> >>> >>> > >
John A Grandy (johnagrandy-at-yahoo-dot-com) writes: [quoted text, click to view] > "Mousey mousey click click" IDEs are taking over the world 'cause > companies like mine incinerate any and all competition trying to write > apps in pure code. If you don't like the new reality, go argue with > Scott Gu , not me.
I echo what David said, about not using the point-and-click GUI for changing databases in a production environment. There are *serious* shortcomings in the Modify Table function in Mgmt Studio and Enterprise Manager, and you are putting your database at stake if you make some changes from the GUI and press save. You can generate a script, carefully review it and make some required changes. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at
[quoted text, click to view] > I don't really buy this example. Why wouldn't you model multiple edges > between vertices as: > > ... number_of_edges INT NOT NULL (CHECK number_of_edges >0), > UNIQUE (InitialVertex, TerminalVertex)) > > ?
Since an edge is an abstract concept, it is not known before hand why there are more than one edge between verticies. Consider connections between two network routers. One is a DS0 (56KB leased), and the other uses an analog POTS modem. They both provide a connection, but the DS0 line costs more than the POTS line, and the DS0 line provides more bandwidth than the analog line. You need a place to record things like cost, bandwidth, etc. It makes sense to have separate edges per entity in this case because there can be many unrelated differentiating criteria--all of which are concrete. [quoted text, click to view] > A Vertex presumably represents something other than just a surrogate > key otherwise what is this a graph of?
That's the point, these tables are surrogate generators, and can be used for any or all graphs in a database, enabling any application or even database constraints to use the same set of functions. How many times have you written code to make sure that a node wasn't already part of a tree? How many times have you written queries against heirarchies? I say: do it once, bullet-proof it, optimize it, and then use it any time you need to represent any type of graph. [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1139997970.904859.242520@g44g2000cwa.googlegroups.com... > Brian Selzer wrote: >> There is only one exception to the rule that I follow when using IDENTITY >> or surrogates in general. Unless the table is completely abstract in the >> model, then there should be at least one candidate key in addition to the >> surrogate. If any column in a table can hold information that originates >> outside of the database, then that table isn't completely abstract. Any >> other use of IDENTITY embeds part of the application in the database. >> >> Here's an example of completely abstract tables: >> >> CREATE TABLE Vertexes >> ( >> Vertex INT IDENTITY(-2147483647, 1) >> PRIMARY KEY CLUSTERED >> ) >> CREATE TABLE Edges >> ( >> Edge INT IDENTITY(-2147483647, 1) >> PRIMARY KEY CLUSTERED, >> InitialVertex INT NOT NULL >> REFERENCES Vertexes (Vertex), >> TerminalVertex INT NOT NULL >> REFERENCES Vertexes (Vertex) >> ) >> > > Hi Brian, > > I don't really buy this example. Why wouldn't you model multiple edges > between vertices as: > > ... number_of_edges INT NOT NULL (CHECK number_of_edges >0), > UNIQUE (InitialVertex, TerminalVertex)) > > ? > > A Vertex presumably represents something other than just a surrogate > key otherwise what is this a graph of? > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
I have no other business-process relevant columns I can use for a PK. The DateTime table col translates in the real world to date only, no time. The hours worked nor the description of the job performed also can not be used to construct a PK. Only a non-BP relevant entity can be constructed to serve as the PK. For a PK, I can either Identity the entire table, or combine an increment-by-1 for each new date. What else is there? [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1139989273.917960.17920@g14g2000cwa.googlegroups.com... > John A Grandy wrote: >> "Mousey mousey click click" IDEs are taking over the world 'cause >> companies >> like mine incinerate any and all competition trying to write apps in pure >> code. If you don't like the new reality, go argue with Scott Gu , not >> me. >> > > Do not use the GUI to make changes in a production environment. > Especially don't make changes to IDENTITY columns this way. The safe > and professional way is to use scripts. > >> >> So how would approach the need for a PK on the following table : >> > > How can we say since we know nothing of your data or business? If there > is no other key but IDENTITY then you have done something seriously > wrong. IDENTITY should never be the only key of a table. > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
Nah, 10mg an hour at worst will just make him a sleepy dope. No real damage done. Joe, maybe you should have a friendly word with Bill and Steve before they toss a couple billion at yet another mousey-mousey click-click oui-oui ... ? [quoted text, click to view] "Mike Hodgson" <e1minst3r@gmail.com> wrote in message news:eqSnZWgMGHA.1312@TK2MSFTNGP09.phx.gbl... John A Grandy wrote:
Joe ... please remember: 40mg valium ever 4 hours, like the doc told you. I know it's real fun up in the rdbms theory ozone layer, but your cardiac health is far more important. Bravo! Let him have it (I think Joe was harsher than normal in his post). But seriously, 40mg every 4 hours? Are you trying to kill him? I don't know anything about valium really but I suspect such a large, regular dose is likely to be worse for his cardiovascular system than just simply leaving his keyboard for a couple hours & going for a walk in the park. -- mike hodgson http://sqlnerd.blogspot.com
We use sql change-scripts tested againt multiple dev and test envs before applying the ddl to prod dbs. [quoted text, click to view] "Erland Sommarskog" <esquel@sommarskog.se> wrote in message news:Xns976B6414DB5E8Yazorman@127.0.0.1... > John A Grandy (johnagrandy-at-yahoo-dot-com) writes: >> "Mousey mousey click click" IDEs are taking over the world 'cause >> companies like mine incinerate any and all competition trying to write >> apps in pure code. If you don't like the new reality, go argue with >> Scott Gu , not me. > > I echo what David said, about not using the point-and-click GUI for > changing > databases in a production environment. There are *serious* shortcomings > in the Modify Table function in Mgmt Studio and Enterprise Manager, and > you are putting your database at stake if you make some changes from the > GUI and press save. You can generate a script, carefully review it and > make > some required changes. > > > -- > Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
[quoted text, click to view] Mike Hodgson wrote: > David Portas wrote: > > >How can we say since we know nothing of your data or business? If there > >is no other key but IDENTITY then you have done something seriously > >wrong. IDENTITY should never be the only key of a table. > > > > > What about a conceptual table like a table of tasks in a project plan > for instance? What would be other candidate keys for a "task"? (I'm > not pushing any particular view, I'm just trying to think of situations > where there is no "natural" key for a table.) > > -- > *mike hodgson* > http://sqlnerd.blogspot.com > Depends what information you want to record for a task. I designed a database for a PM application where the key was something like (parent_task_no, task_type). Task_name would presumably be another potential candidate. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
[quoted text, click to view] Brian Selzer wrote: > There is only one exception to the rule that I follow when using IDENTITY or surrogates in general. Unless the table is completely abstract in the model, then there should be at least one candidate key in addition to the surrogate. If any column in a table can hold information that originates outside of the database, then that table isn't completely abstract. Any other use of IDENTITY embeds part of the application in the database. > > Here's an example of completely abstract tables: > > CREATE TABLE Vertexes > ( > Vertex INT IDENTITY(-2147483647, 1) > PRIMARY KEY CLUSTERED > ) > CREATE TABLE Edges > ( > Edge INT IDENTITY(-2147483647, 1) > PRIMARY KEY CLUSTERED, > InitialVertex INT NOT NULL > REFERENCES Vertexes (Vertex), > TerminalVertex INT NOT NULL > REFERENCES Vertexes (Vertex) > ) >
Hi Brian, I don't really buy this example. Why wouldn't you model multiple edges between vertices as: .... number_of_edges INT NOT NULL (CHECK number_of_edges >0), UNIQUE (InitialVertex, TerminalVertex)) ? A Vertex presumably represents something other than just a surrogate key otherwise what is this a graph of? -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
[quoted text, click to view] John A Grandy wrote: > I have no other business-process relevant columns I can use for a PK. The > DateTime table col translates in the real world to date only, no time. The > hours worked nor the description of the job performed also can not be used > to construct a PK. > > Only a non-BP relevant entity can be constructed to serve as the PK. For a > PK, I can either Identity the entire table, or combine an increment-by-1 for > each new date. > > What else is there?
Wrong question. Why would you want to store duplicate (=redundant) data in this table? A more fundamental question than "what is the key?" is "what information are you trying to record about this entity?". -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
[quoted text, click to view] Tony Rogerson wrote: > Hi David, > > What about a message board? > > -- > Tony Rogerson > SQL Server MVP > http://sqlserverfaq.com - free video tutorials > Good question. Perhaps (thread_id, user_id, post_datetime)? From a purist point-of-view you might want to add message_body to that key but chances are your maximum message size will break SQL Server's 900 byte key size limit. In practice you obviously want a key that SQL Server can enforce so some compound key including a DATETIME seems like the obvious candidate. In 2005 we can now easily generate hashes in the database which means we can work around the product limitation whereby uniqueness constraints cannot be enforced against all datatypes. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
There is only one exception to the rule that I follow when using = IDENTITY or surrogates in general. Unless the table is completely = abstract in the model, then there should be at least one candidate key = in addition to the surrogate. If any column in a table can hold = information that originates outside of the database, then that table = isn't completely abstract. Any other use of IDENTITY embeds part of the = application in the database. =20 Here's an example of completely abstract tables: CREATE TABLE Vertexes=20 ( Vertex INT IDENTITY(-2147483647, 1)=20 PRIMARY KEY CLUSTERED ) CREATE TABLE Edges ( Edge INT IDENTITY(-2147483647, 1)=20 PRIMARY KEY CLUSTERED, InitialVertex INT NOT NULL REFERENCES Vertexes (Vertex), TerminalVertex INT NOT NULL REFERENCES Vertexes (Vertex) ) These tables can be used as surrogate generators for concrete tables = that represent information that can be modeled as any type of = graph--that is, a tree, heirarchy, network, etc. A unique constraint on = Edges(InitialVertex, TerminalVertex) is contraindicated because a = multigraph allows for multiple edges between verticies. Now you can = write a single set of table-valued and scalar functions to find the = degree of a vertex; the shortest path between two verticies; left, right = or undirected components; etc. These can then be applied to simplify = queries against concrete tables, such as finding the minimum cost = process for producing a part, or finding the minimum time process for = producing the same part. In addition, you may also be able to pin the = tables in memory to speed up those types of queries. [quoted text, click to view] "Mike Hodgson" <e1minst3r@gmail.com> wrote in message = news:ueolyTgMGHA.2628@TK2MSFTNGP15.phx.gbl...
David Portas wrote:=20 How can we say since we know nothing of your data or business? If there is no other key but IDENTITY then you have done something seriously wrong. IDENTITY should never be the only key of a table. What about a conceptual table like a table of tasks in a project plan = for instance? What would be other candidate keys for a "task"? (I'm = not pushing any particular view, I'm just trying to think of situations = where there is no "natural" key for a table.) -- mike hodgson
[quoted text, click to view] Tony Rogerson wrote: > Using the message body is unpractical, even post subject is probably > impractical too. > > Consider these forums - user_id is email address (though not unique, many > users may post under the same annonymous account), thread is > microsoft.public.sqlserver.programming (but is it? thats just the posting > server, you can post to the same message on mulitple servers - demon, > compuserve etc...) > > Consider a new release of a service pack that causes problem, its > conceiveable though unlikely you will get two different posters at exactly > the same time with the same subject. > > Its more practical to use a generated primary key, for nntp its a guid that > includes the posting domain server name. > > I think my point is - in real world, once you come to implement a design you > have to be practical; using NEWID() or the IDENTITY property in this case is > more than justifiable and makes complete sense. > > You can use CHECKSUM on the body to create a number, but thats not unique. > > -- > Tony Rogerson > SQL Server MVP > http://sqlserverfaq.com - free video tutorials > > Yes of course you can receive duplicate data on any logical key you care to choose. The question is what does the customer demand from the database in that eventuality? Even assuming you want to display all the duplicate discussion posts in your app there is absolutely no reason to store redundant rows in the database. Your example problem therefore isn't really answerable without more information about the business requirements regarding duplicate posts. (An example of why newgroups aren't a good place for design discussions - it's unfeasibly hard to define the problem domain up front.) Redundancy gives you nothing but it costs you storage, bandwidth and integrity. Those are the immediate practical costs of leaving out real keys. With SQL Server's IDENTITY column implementation there are other special considerations. Some problems just don't have feasible set-based solutions unless you have enforced a natural key in addition to the IDENTITY key. A classic example is how to populate a table that has a self-referencing foreign key. Another example is how to integrate data from two data sources in tables with foreign keys. It seems to me that's a big price to pay for leaving out a few UNIQUE constraints. [quoted text, click to view] > You can use CHECKSUM on the body to create a number, but thats not unique.
Use HashBytes() instead. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
Each day, each worker has a number of tasks they are required to enter description and hours to complete. It's a timesheet except that it's irrelevant what time they started//finished any given task. Think of billing a client .... the client only cares that you're charging them 6 hours to do X and what day was the work performed on. They don't care what time you started, finished, how many breaks you took, how long, etc. [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1139998144.981652.267720@o13g2000cwo.googlegroups.com... > John A Grandy wrote: >> I have no other business-process relevant columns I can use for a PK. >> The >> DateTime table col translates in the real world to date only, no time. >> The >> hours worked nor the description of the job performed also can not be >> used >> to construct a PK. >> >> Only a non-BP relevant entity can be constructed to serve as the PK. For >> a >> PK, I can either Identity the entire table, or combine an increment-by-1 >> for >> each new date. >> >> What else is there? > > Wrong question. Why would you want to store duplicate (=redundant) data > in this table? A more fundamental question than "what is the key?" is > "what information are you trying to record about this entity?". > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
[quoted text, click to view] John A Grandy wrote: > Each day, each worker has a number of tasks they are required to enter > description and hours to complete. It's a timesheet except that it's > irrelevant what time they started//finished any given task. > > Think of billing a client .... the client only cares that you're charging > them 6 hours to do X and what day was the work performed on. They don't care > what time you started, finished, how many breaks you took, how long, etc. > >
So it sounds like (taskdate, taskdescription) should be the key. Or should client be in there as well? You didn't post the rest of the schema but if you are recording the client for each task then surely you must have another table also keyed on taskid because the client doesn't appear at all in this table? So that may explain why you missed the key - the schema is somehow denormalized. The timesheet I use has a pre-defined set of task descriptions to choose from. I would be inclined to add a key for description and put that in its own table so that you can report on common tasks across all clients. I'll bet the client doesn't read a 2KB description for every task either. Surely a more compact descirption or code for each task would make sense. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
Tasks are typically assigned or entered by a specific person at a = specific time. So EmployeeID + EntryDate would be a natural key. [quoted text, click to view] "Mike Hodgson" <e1minst3r@gmail.com> wrote in message = news:ueolyTgMGHA.2628@TK2MSFTNGP15.phx.gbl...
David Portas wrote:=20 How can we say since we know nothing of your data or business? If there is no other key but IDENTITY then you have done something seriously wrong. IDENTITY should never be the only key of a table. What about a conceptual table like a table of tasks in a project plan = for instance? What would be other candidate keys for a "task"? (I'm = not pushing any particular view, I'm just trying to think of situations = where there is no "natural" key for a table.) -- mike hodgson
[quoted text, click to view] Tony Rogerson wrote: > Use this news group as a test case, all posts get in regardless; so they > need to be globally keyed across all possible replicated servers. >
Logical keys don't have to prevent duplicate posts. If it's a requirement to show duplicate posts from separate servers then presumably you have a one-to-many relationship between the Posts table (WITH it's logical key) and the server identifier (GUID or whatever). No logical reason to denormalize the GUID into the Posts table. All the logical keys do is eliminate *redundancy*, which is just fine because redundancy gives you nothing extra. [quoted text, click to view] > > Use HashBytes() instead. > > All you are doing is encrypting the data, that does not help you generate a > logical key that is unique.
That's exactly what it does do actually. Hashing isn't the same as encryption or checksums. The point of the secure hashes is that they can be safely assumed to be unique for a given domain of inputs. Unless you hash obscenely large amounts of arbitrary binary data you won't get hash colisions within the lifetime of the system. Not with VARCHAR/NVARCHAR data anyway. This is an efficient and totally reliable way to compare large data sets and even whole documents. [quoted text, click to view] > So my question is - do you not think this is one of many situations where no > practical logical key exists, remember - i'm talking about unique key, which > by definition is what a key is afterall.
The only examples I've come across where I'll agree with you are staging or logging tables. If you have some external feed over which you don't have full control then you may want to stage the data using only generated keys. You could view this as an ETL problem of course and say that it's the job of your data integration tier to normalize the data before it hits the database. For many good business reasons however, we often tackle this as ELT and so we temporarily have to work without keys. Typically we throw away the staging data afterwards and work with the good data model that we've derived from it. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
-- [quoted text, click to view] "Tony Rogerson" wrote: > DECLARE @HashThis1 nvarchar; > SELECT @HashThis1 = CONVERT(nvarchar,'tony rogerson'); > SELECT HashBytes('MD5', @HashThis1); > > DECLARE @HashThis2 nvarchar; > SELECT @HashThis2 = CONVERT(nvarchar,'tony rogerson reply'); > SELECT HashBytes('MD5', @HashThis2); > > if @HashThis1 = @HashThis2 > print 'same' > > Perhaps I am missing something, the above, although different inputs give > the same hash. >
Tony, they are the same because of your variable declaration of nvarchar. Since you didn't specify a length, it will default to 1 character. Your @HashThis1 and @HashThis2 variables both start with 't', so they will have the same hash value. To distinct strings producing the same hash value (called a collision) are very, very rare, but they can happen.
There are a few examples of MD5 collisions and you can Google for them but they rely on specially contrived inputs and lots of computing power. The number of possible outputs is 128 bits which is an astronomically large number. There is a function to calculate the probability of collisions in a domain of given size but I don't know it off hand. Perhaps Steve Kass will jump in with it. For most practical purposes we can ignore those problems but SHA1 is anyway generally believed to be more reliable. [quoted text, click to view] > You can't get away from the fact that a message board, unless you restrict a > users input, does not have a natural usable key, you have to create your > own.
You can't get away from the fact that redundancy is redundant! CREATE TABLE MessageBoard (username VARCHAR(10), message VARCHAR(100), /*...etc, etc */ number_of_times_posted INTEGER NOT NULL CHECK (number_of_times_posted>0), PRIMARY KEY (username,message)) -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
[quoted text, click to view] > With HashBytes you are going to get duplicates, by the very nature of > reducing data down to a hash will give you inputs on a large enough data > source. > > Do you have figures for the number of 4K rows that would cause a duplicate > to occur? >
From the Handbook of Applied Cryptography, MD5 has a collision probability of 2^64; and that figure reflects someone's chances of they are trying to produce a collision based on knowledge of how MD5 works. The odds that two
[quoted text, click to view] >> You can't get away from the fact that a message board, unless you >> restrict a users input, does not have a natural usable key, you have to >> create your own.
I think you are confusing an entity's representation in the database with its existence in reality. There is no such thing as absolute duplicates in reality. If they are *absolute duplicates* you wouldn't be able to distinguish them in reality. Whether it is a newsgroup post or a star in the sky, if there is true duplication how would you even know it? It is self-contradictory. A logical representation captures certain attributes of an entity in a database. There the question is whether a given representation captures all essential attributes such that one entity is distinguishable from another. If they do not, you'll end up with duplicates in a table that can only be distinguished by physical means. -- Anith
[quoted text, click to view] Tony Rogerson wrote: > So you limit the mesage to 100 bytes, or 800 bytes? That isn't practical, > consider the message which is now probably 4KB. > > If you use a hash, you are generating your own surrogate key. > > You cannot use a primary key of username, message out in the real world. > > -- > Tony Rogerson > SQL Server MVP > http://sqlserverfaq.com - free video tutorials > You are absolutely right that a hash is a surrogate key. I've nothing against surrogate keys. The hash is a handy workaround for the SQL Server limitation that prevents us creating unique constraints on large columns. It is just a method of enforcing entity integrity. On the other hand an IDENTITY column does NOT enforce entity integrity at all. An IDENTITY key does NOT prevent duplicates - it just enumerates them in a table. I've nowhere said that you shouldn't use IDENTITY as a key. You are right that it is surely going to be desirable to do so in this case. By all means use IDENTITY as a *surrogate* key. But never as the ONLY key of a table - that way lies disaster. I hope you agree with me after all. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
Using the message body is unpractical, even post subject is probably impractical too. Consider these forums - user_id is email address (though not unique, many users may post under the same annonymous account), thread is microsoft.public.sqlserver.programming (but is it? thats just the posting server, you can post to the same message on mulitple servers - demon, compuserve etc...) Consider a new release of a service pack that causes problem, its conceiveable though unlikely you will get two different posters at exactly the same time with the same subject. Its more practical to use a generated primary key, for nntp its a guid that includes the posting domain server name. I think my point is - in real world, once you come to implement a design you have to be practical; using NEWID() or the IDENTITY property in this case is more than justifiable and makes complete sense. You can use CHECKSUM on the body to create a number, but thats not unique. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1140000512.576448.276060@g43g2000cwa.googlegroups.com... > Tony Rogerson wrote: >> Hi David, >> >> What about a message board? >> >> -- >> Tony Rogerson >> SQL Server MVP >> http://sqlserverfaq.com - free video tutorials >> > > Good question. Perhaps (thread_id, user_id, post_datetime)? From a > purist point-of-view you might want to add message_body to that key but > chances are your maximum message size will break SQL Server's 900 byte > key size limit. In practice you obviously want a key that SQL Server > can enforce so some compound key including a DATETIME seems like the > obvious candidate. > > In 2005 we can now easily generate hashes in the database which means > we can work around the product limitation whereby uniqueness > constraints cannot be enforced against all datatypes. > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
[quoted text, click to view] > Yes of course you can receive duplicate data on any logical key you > care to choose. The question is what does the customer demand from the > database in that eventuality? Even assuming you want to display all the > duplicate discussion posts in your app there is absolutely no reason to > store redundant rows in the database. Your example problem therefore > isn't really answerable without more information about the business > requirements regarding duplicate posts. (An example of why newgroups > aren't a good place for design discussions - it's unfeasibly hard to > define the problem domain up front.)
Use this news group as a test case, all posts get in regardless; so they need to be globally keyed across all possible replicated servers. They only way to do that is to have a global unique identifier. If you where to allow duplicate logical keys (which you would have to) then the message forum would not work - you can not truely identify which post a user is replying to if duplicates exists. By creating a system generated identifier using NEWID() aka GUID then you don't have that problem, the post goes to the proper thread and parent post. [quoted text, click to view] > Use HashBytes() instead.
All you are doing is encrypting the data, that does not help you generate a logical key that is unique. So my question is - do you not think this is one of many situations where no practical logical key exists, remember - i'm talking about unique key, which by definition is what a key is afterall. Tony. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1140015043.390556.37050@g14g2000cwa.googlegroups.com... > Tony Rogerson wrote: >> Using the message body is unpractical, even post subject is probably >> impractical too. >> >> Consider these forums - user_id is email address (though not unique, many >> users may post under the same annonymous account), thread is >> microsoft.public.sqlserver.programming (but is it? thats just the posting >> server, you can post to the same message on mulitple servers - demon, >> compuserve etc...) >> >> Consider a new release of a service pack that causes problem, its >> conceiveable though unlikely you will get two different posters at >> exactly >> the same time with the same subject. >> >> Its more practical to use a generated primary key, for nntp its a guid >> that >> includes the posting domain server name. >> >> I think my point is - in real world, once you come to implement a design >> you >> have to be practical; using NEWID() or the IDENTITY property in this case >> is >> more than justifiable and makes complete sense. >> >> You can use CHECKSUM on the body to create a number, but thats not >> unique. >> >> -- >> Tony Rogerson >> SQL Server MVP >> http://sqlserverfaq.com - free video tutorials >> >> > > Yes of course you can receive duplicate data on any logical key you > care to choose. The question is what does the customer demand from the > database in that eventuality? Even assuming you want to display all the > duplicate discussion posts in your app there is absolutely no reason to > store redundant rows in the database. Your example problem therefore > isn't really answerable without more information about the business > requirements regarding duplicate posts. (An example of why newgroups > aren't a good place for design discussions - it's unfeasibly hard to > define the problem domain up front.) > > Redundancy gives you nothing but it costs you storage, bandwidth and > integrity. Those are the immediate practical costs of leaving out real > keys. > > With SQL Server's IDENTITY column implementation there are other > special considerations. Some problems just don't have feasible > set-based solutions unless you have enforced a natural key in addition > to the IDENTITY key. A classic example is how to populate a table that > has a self-referencing foreign key. Another example is how to integrate > data from two data sources in tables with foreign keys. It seems to me > that's a big price to pay for leaving out a few UNIQUE constraints. > >> You can use CHECKSUM on the body to create a number, but thats not >> unique. > > Use HashBytes() instead. > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
Gosh, I will go forth and use NOTEPAD.EXE forever more. I just hope that my clients will pay me double because the extra time it takes to do anything. No longer can I simply look at a graphical representation of a query plan I have go out to the DOS prompt and get the text version and work out what all this is about. Your time machine seems to be malfunctioning, this is 2006 and not 1980! Go and get a job as a junior programmer and get some very needed industry and current toolset experience. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "--CELKO--" <jcelko212@earthlink.net> wrote in message news:1139983626.481357.192860@g43g2000cwa.googlegroups.com... > BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER, > NEVER a primary. Did you sleep thru RDBMS 101? > > Get the books you never read, and look up what a key is. It is to be a > subset of the attributes of an entity -- this is by definition! > > What you are doing is confusing RDBMS with a 1950's magnetic tape file > systems, where you located a record (NOT a row!! Totally different > concept!) in a sequence. Just like a record number, so is the IDENTITY > column. A totally non-relational PHYSICAL locator based on when a > PHYSICAL record (which models but is not the samew as a row) is > inserted into a table. > >>> how to set the PK in SS Mgmt Studio ? << > > Who cares?? You are not not a real SQL programmer!! You are a > "mousey, mousey, click , click" non-programmer. (with a French accent) > we spit on you, Video gamer! > > to be serious, real programmers use a text editor. They know the > language they write in. Those stinking "video game tools"slow us down. > And they lead us to ask questiosn like this in newsgroups where people > liek me will maek fun of you. >
DECLARE @HashThis1 nvarchar; SELECT @HashThis1 = CONVERT(nvarchar,'tony rogerson'); SELECT HashBytes('MD5', @HashThis1); DECLARE @HashThis2 nvarchar; SELECT @HashThis2 = CONVERT(nvarchar,'tony rogerson reply'); SELECT HashBytes('MD5', @HashThis2); if @HashThis1 = @HashThis2 print 'same' Perhaps I am missing something, the above, although different inputs give the same hash. I'm not talking duplicate posts from seperate servers, I'm talking about the same server. You can't get away from the fact that a message board, unless you restrict a users input, does not have a natural usable key, you have to create your own. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1140025686.549135.13730@g43g2000cwa.googlegroups.com... > Tony Rogerson wrote: >> Use this news group as a test case, all posts get in regardless; so they >> need to be globally keyed across all possible replicated servers. >> > > Logical keys don't have to prevent duplicate posts. If it's a > requirement to show duplicate posts from separate servers then > presumably you have a one-to-many relationship between the Posts table > (WITH it's logical key) and the server identifier (GUID or whatever). > No logical reason to denormalize the GUID into the Posts table. All the > logical keys do is eliminate *redundancy*, which is just fine because > redundancy gives you nothing extra. > >> > Use HashBytes() instead. >> >> All you are doing is encrypting the data, that does not help you generate >> a >> logical key that is unique. > > That's exactly what it does do actually. Hashing isn't the same as > encryption or checksums. The point of the secure hashes is that they > can be safely assumed to be unique for a given domain of inputs. Unless > you hash obscenely large amounts of arbitrary binary data you won't get > hash colisions within the lifetime of the system. Not with > VARCHAR/NVARCHAR data anyway. This is an efficient and totally reliable > way to compare large data sets and even whole documents. > >> So my question is - do you not think this is one of many situations where >> no >> practical logical key exists, remember - i'm talking about unique key, >> which >> by definition is what a key is afterall. > > The only examples I've come across where I'll agree with you are > staging or logging tables. If you have some external feed over which > you don't have full control then you may want to stage the data using > only generated keys. You could view this as an ETL problem of course > and say that it's the job of your data integration tier to normalize > the data before it hits the database. For many good business reasons > however, we often tackle this as ELT and so we temporarily have to work > without keys. Typically we throw away the staging data afterwards and > work with the good data model that we've derived from it. > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
The example I gave (copied from BOL) is wrong - its wrong in BOL, it should use nvarchar(max) and not nvarchar; thats what happens when you trust the examples! Should be... DECLARE @HashThis1 nvarchar(max); SELECT @HashThis1 = CONVERT(nvarchar(max),'tony rogerson'); SELECT HashBytes('MD5', @HashThis1); DECLARE @HashThis2 nvarchar(max); SELECT @HashThis2 = CONVERT(nvarchar(max),'tony rogerson reply'); SELECT HashBytes('MD5', @HashThis2); if @HashThis1 = @HashThis2 print 'same' Which works. With HashBytes you are going to get duplicates, by the very nature of reducing data down to a hash will give you inputs on a large enough data source. Do you have figures for the number of 4K rows that would cause a duplicate to occur? By creating the hash in this way you are creating a surrogate key I suppose - but the fact that duplicates can still occur i just do not like. Again, it still a scenerio where there is no reliable natural key present that could give you gaurenteed uniqueness. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "Tony Rogerson" <tonyrogerson@sqlserverfaq.com> wrote in message news:urGW6jlMGHA.2704@TK2MSFTNGP15.phx.gbl... > DECLARE @HashThis1 nvarchar; > SELECT @HashThis1 = CONVERT(nvarchar,'tony rogerson'); > SELECT HashBytes('MD5', @HashThis1); > > DECLARE @HashThis2 nvarchar; > SELECT @HashThis2 = CONVERT(nvarchar,'tony rogerson reply'); > SELECT HashBytes('MD5', @HashThis2); > > if @HashThis1 = @HashThis2 > print 'same' > > Perhaps I am missing something, the above, although different inputs give > the same hash. > > I'm not talking duplicate posts from seperate servers, I'm talking about > the same server. > > You can't get away from the fact that a message board, unless you restrict > a users input, does not have a natural usable key, you have to create your > own. > > -- > Tony Rogerson > SQL Server MVP > http://sqlserverfaq.com - free video tutorials > > > "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message > news:1140025686.549135.13730@g43g2000cwa.googlegroups.com... >> Tony Rogerson wrote: >>> Use this news group as a test case, all posts get in regardless; so they >>> need to be globally keyed across all possible replicated servers. >>> >> >> Logical keys don't have to prevent duplicate posts. If it's a >> requirement to show duplicate posts from separate servers then >> presumably you have a one-to-many relationship between the Posts table >> (WITH it's logical key) and the server identifier (GUID or whatever). >> No logical reason to denormalize the GUID into the Posts table. All the >> logical keys do is eliminate *redundancy*, which is just fine because >> redundancy gives you nothing extra. >> >>> > Use HashBytes() instead. >>> >>> All you are doing is encrypting the data, that does not help you >>> generate a >>> logical key that is unique. >> >> That's exactly what it does do actually. Hashing isn't the same as >> encryption or checksums. The point of the secure hashes is that they >> can be safely assumed to be unique for a given domain of inputs. Unless >> you hash obscenely large amounts of arbitrary binary data you won't get >> hash colisions within the lifetime of the system. Not with >> VARCHAR/NVARCHAR data anyway. This is an efficient and totally reliable >> way to compare large data sets and even whole documents. >> >>> So my question is - do you not think this is one of many situations >>> where no >>> practical logical key exists, remember - i'm talking about unique key, >>> which >>> by definition is what a key is afterall. >> >> The only examples I've come across where I'll agree with you are >> staging or logging tables. If you have some external feed over which >> you don't have full control then you may want to stage the data using >> only generated keys. You could view this as an ETL problem of course >> and say that it's the job of your data integration tier to normalize >> the data before it hits the database. For many good business reasons >> however, we often tackle this as ELT and so we temporarily have to work >> without keys. Typically we throw away the staging data afterwards and >> work with the good data model that we've derived from it. >> >> -- >> David Portas, SQL Server MVP >> >> Whenever possible please post enough code to reproduce your problem. >> Including CREATE TABLE and INSERT statements usually helps. >> State what version of SQL Server you are using and specify the content >> of any error messages. >> >> SQL Server Books Online: >> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx >> -- >> > >
Nah, I don't think that people who use 3GLs aren't real = programmers--almost all of the work I do now is in higher-level = languages: I think that people who can't read assembly code aren't real = programmers. Sometimes you have to step through code in order to find = bugs, and that's pretty difficult if you don't have a clue about how = native code executes, especially if the problem only appears in the = release version where all of the debugging information has been stripped = out. You can never take anything for granted. Over the years I've = found several bugs in commercial compilers, and it's nice to be able to = prove to your boss that the reason it took you two days to fix a problem = is that there's a bug in the compiler. [quoted text, click to view] "Mike Hodgson" <e1minst3r@gmail.com> wrote in message = news:eKsLHZgMGHA.3800@TK2MSFTNGP10.phx.gbl...
Ease up a bit Joe. That was harsh even by your usual standards. And, = from memory, the John Cleese quote from The Holy Grail was "I fart in = your general direction!" but it has been a while since I last saw it so = I could be mistaken. BTW, hard-core assembly programmers (not me - I hated assembly at Uni) = might argue that people who don't program in machine code or assembly = but use 3GLs, like C for instance, are not real programmers either. = It's all a matter of perspective. -- mike hodgson http://sqlnerd.blogspot.com=20 --CELKO-- wrote:=20 BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER, NEVER a primary. Did you sleep thru RDBMS 101? Get the books you never read, and look up what a key is. It is to be a subset of the attributes of an entity -- this is by definition! What you are doing is confusing RDBMS with a 1950's magnetic tape file systems, where you located a record (NOT a row!! Totally different concept!) in a sequence. Just like a record number, so is the IDENTITY column. A totally non-relational PHYSICAL locator based on when a PHYSICAL record (which models but is not the samew as a row) is inserted into a table. how to set the PK in SS Mgmt Studio ? << =20 Who cares?? You are not not a real SQL programmer!! You are a "mousey, mousey, click , click" non-programmer. (with a French accent) we spit on you, Video gamer! to be serious, real programmers use a text editor. They know the language they write in. Those stinking "video game tools"slow us down. And they lead us to ask questiosn like this in newsgroups where people
So you limit the mesage to 100 bytes, or 800 bytes? That isn't practical, consider the message which is now probably 4KB. If you use a hash, you are generating your own surrogate key. You cannot use a primary key of username, message out in the real world. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1140027777.737033.155160@g43g2000cwa.googlegroups.com... > There are a few examples of MD5 collisions and you can Google for them > but they rely on specially contrived inputs and lots of computing > power. The number of possible outputs is 128 bits which is an > astronomically large number. There is a function to calculate the > probability of collisions in a domain of given size but I don't know it > off hand. Perhaps Steve Kass will jump in with it. For most practical > purposes we can ignore those problems but SHA1 is anyway generally > believed to be more reliable. > >> You can't get away from the fact that a message board, unless you >> restrict a >> users input, does not have a natural usable key, you have to create your >> own. > > You can't get away from the fact that redundancy is redundant! > > CREATE TABLE MessageBoard (username VARCHAR(10), message VARCHAR(100), > /*...etc, etc */ number_of_times_posted INTEGER NOT NULL CHECK > (number_of_times_posted>0), PRIMARY KEY (username,message)) > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
Anith, I am talking about the implementation of in this instance the message board as a table. It needs a primary key, David tells us to use username, message which just isn't practical because that limits you to 900 bytes which wouldn't be a very good message board. You need to generate your own primary key - simple as that and no matter how its dressed up that is the reality. In NNTP they use a GUID which includes the domain name. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "Anith Sen" <anith@bizdatasolutions.com> wrote in message news:eyVBjolMGHA.3432@tk2msftngp13.phx.gbl... >>> You can't get away from the fact that a message board, unless you >>> restrict a users input, does not have a natural usable key, you have to >>> create your own. > > I think you are confusing an entity's representation in the database with > its existence in reality. There is no such thing as absolute duplicates in > reality. > > If they are *absolute duplicates* you wouldn't be able to distinguish them > in reality. Whether it is a newsgroup post or a star in the sky, if there > is true duplication how would you even know it? It is self-contradictory. > > A logical representation captures certain attributes of an entity in a > database. There the question is whether a given representation captures > all essential attributes such that one entity is distinguishable from > another. If they do not, you'll end up with duplicates in a table that can > only be distinguished by physical means. > > -- > Anith >
Nope, still don't agree. I agree that IDENTITY property does not guarentee uniquess, the UNIQUE constraint or PRIMARY KEY serves that purpose. The IDENTITY property serves as a method to create a unique number, just like the HashBytes serves to create a number (note, I refrain from saying unique). If you are going to use a key based on the IDENTITY property then you will put a UNIQUE constraint, or if you are using it because there is no natural key present a PRIMARY KEY. I don't see the point of hashing, it just causes a lot of unecessary CPU cycles when you have something you can use already, the hash is not humanly useable but a simply 4 byte number is - we've all been there when developing, looking up ID's etc... And, although its a high probability there is still the chance that with SHA1 you will still get a duplicate, for instance - if two people post a blank message, which happens from time to time (i've just checked my SQL copy of these news groups). So, going back to the CREATE TABLE; do you therefore advocate no PRIMARY KEY on the message board table, just a surrogate based on the SHA1 hash? Last point, what happens if the message is changed? In that case the hash will change and you are stuffed - you run into the concurrency problem with using the natural key everywhere as the foreign key instead of using a surrogate. -- Tony Rogerson SQL Server MVP http://sqlserverfaq.com - free video tutorials [quoted text, click to view] "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:1140037327.751757.137050@o13g2000cwo.googlegroups.com... > Tony Rogerson wrote: >> So you limit the mesage to 100 bytes, or 800 bytes? That isn't practical, >> consider the message which is now probably 4KB. >> >> If you use a hash, you are generating your own surrogate key. >> >> You cannot use a primary key of username, message out in the real world. >> >> -- >> Tony Rogerson >> SQL Server MVP >> http://sqlserverfaq.com - free video tutorials >> > > You are absolutely right that a hash is a surrogate key. I've nothing > against surrogate keys. The hash is a handy workaround for the SQL > Server limitation that prevents us creating unique constraints on large > columns. It is just a method of enforcing entity integrity. On the > other hand an IDENTITY column does NOT enforce entity integrity at all. > An IDENTITY key does NOT prevent duplicates - it just enumerates them > in a table. > > I've nowhere said that you shouldn't use IDENTITY as a key. You are > right that it is surely going to be desirable to do so in this case. By > all means use IDENTITY as a *surrogate* key. But never as the ONLY key > of a table - that way lies disaster. I hope you agree with me after > all. > > -- > David Portas, SQL Server MVP > > Whenever possible please post enough code to reproduce your problem. > Including CREATE TABLE and INSERT statements usually helps. > State what version of SQL Server you are using and specify the content > of any error messages. > > SQL Server Books Online: > http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx > -- >
[quoted text, click to view] Tony Rogerson wrote: > Nope, still don't agree. > > I agree that IDENTITY property does not guarentee uniquess, the UNIQUE > constraint or PRIMARY KEY serves that purpose. > > The IDENTITY property serves as a method to create a unique number, just > like the HashBytes serves to create a number (note, I refrain from saying > unique). > > If you are going to use a key based on the IDENTITY property then you will > put a UNIQUE constraint, or if you are using it because there is no > natural key present a PRIMARY KEY. >
I didn't make myself very clear. When I said: "An IDENTITY key doesn't prevent duplicates" what I meant was: "A column with the IDENTITY property and with a UNIQUE or PRIMARY KEY constraint does not prevent duplicates" A table is a set of facts. If your table has no key (= UNIQUE or PRIMARY KEY) except on an IDENTITY column then you have potential redundancy and almost certainly actual redundancy because you repeat the same fact(s) on multiple rows. The redundancy serves no useful purpose and has the drawbacks I already outlined: storage cost, bandwidth and logical integrity and complexity problems. You may believe that there is "information" in the IDENTITY column which makes it useful as an entity integrity constraint. I say there isn't or there shouldn't be. One of the fundamental reasons we use IDENTITY columns at all is because they are assumed to be immutable BECAUSE they are meaningless. Immutability is a desirable property for a primary key because it is referenced many times by foreign keys. Usually we don't expose IDENTITY values to end users at all because we know that if we do so they will acquire business meaning and therefore they may be forced to change. We also know that if we migrate data between systems we may want to change the surrogate keys on INSERT as part of the integration process and we can only safely do that if IDENTITY columns don't have any meaning that is exposed to the user. Finaly, if we did expose meaning in an IDENTITY column we could have big problems because we can't even UPDATE an IDENTITY column in SQL Server. So given that it would be unsafe and undesirable to expose any meaning in an IDENTITY column we always have and should enforce another key. For example we don't want to see this: CREATE TABLE T1 (x INT IDENTITY PRIMARY KEY, z VARCHAR(10) NOT NULL /* No key! */) INSERT INTO T1 (z) VALUES ('ABC') INSERT INTO T1 (z) VALUES ('ABC') -- !! duplicate INSERT INTO T1 (z) VALUES ('ABC') -- !! duplicate This would be a disaster if you have a foreign key referencing the IDENTITY column. The user can't be sure which entities he is updating because he only sees the business key that has meaning in the real world. Probably he'll erroneously add referencing rows more than once because they get split between the different parent rows. Then someone changes the z value on one row and some of the related data is left behind still attached to the same z value on another row. Remember, as far as the end user is concerned, z is the ONLY key. It gets worse. Let's add another table and more sample data: CREATE TABLE T2 (x INT NOT NULL REFERENCES T1 (x), z1 VARCHAR(10) NOT NULL, PRIMARY KEY (x,z1)) INSERT INTO T1 (z) VALUES ('DEF') /* I've had to assume the parent keys that I'm referencing below: */ INSERT INTO T2 (x,z1) VALUES (2,'P') INSERT INTO T2 (x,z1) VALUES (3,'Q') INSERT INTO T2 (x,z1) VALUES (4,'R') As an exercise, try migrating this data into an identical pair of tables T3 and T4: CREATE TABLE T3 (x INT IDENTITY PRIMARY KEY, z VARCHAR(10) NOT NULL) CREATE TABLE T4 (x INT NOT NULL REFERENCES T1 (x), z1 VARCHAR(10) NOT NULL, PRIMARY KEY (x,z1)) Your goal is obviously to preserve the relationship between the rows in T1 and T2. The first part seems to be easy: INSERT INTO T3 (z) SELECT z FROM T1; But how do you insert the rows from T2? Assume there was already data in T3 so you are now working with different IDENTITY keys. If you'd had an alternate key your task would be simply accomplished with two insert statements. But you don't have a real key at all here. Your options are limited. If the system is idle you could use an intermediate table to generate surrogate keys then INSERT them using IDENITY_INSERT. That's messy and slow and not easy if the system is in use. You could always use a cursor I suppose. OR... you could eliminate the redundancy that should not have been there in the first place. Now you may feel you can live with wasted storage and with slow and complex queries but why would you want to pay that price? [quoted text, click to view] > And, although its a high probability there is still the chance that with > SHA1 you will still get a duplicate, for instance - if two people post a > blank message, which happens from time to time (i've just checked my SQL > copy of these news groups).
That's why I added the column called number_of_times_posted. Add as many duplicate messages as you like and just incerement the count for each one. That was precisely the point I was trying to make with my CREATE TABLE statement but I didn't make it very well. [quoted text, click to view] > > So, going back to the CREATE TABLE; do you therefore advocate no PRIMARY > KEY on the message board table, just a surrogate based on the SHA1 hash?
No. Absolutely not. I put that example together very quickly and carelessly. If there were referencing foreign keys then I'd use an IDENTITY PRIMARY KEY but WITH a UNIQUE constraint on the hash. Working partial examples like this isn't so very instructive however. The proper way to approach the problem is from design principles: IDENTITY PRIMARY KEY is a good surrogate key for enforcing referential integrity against. IDENTITY PRIMARY KEY is transparently worse than useless for the purposes of entity integrity - for that you need constraints on natural key columns. Usually entity integriry constraints take the form of a UNIQUE constraint on some of those columns. In the special case where the key is too wide you can use a surrogate hash key with a unique constraint, which achieves the same effect. I believe the cases where the hash solution is appropriate and necessary are rare but I accept that they do exist and the hash seems to be an elegant solution. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx --
[quoted text, click to view] David Portas wrote: >I didn't make myself very clear. When I said: > > "An IDENTITY key doesn't prevent duplicates" > >what I meant was: > > "A column with the IDENTITY property and with a UNIQUE or PRIMARY KEY >constraint does not prevent duplicates" > >
Um...doesn't a UNIQUE or PRIMARY KEY (which, by definition, implies UNIQUE) constraint guarantee the absence of duplicates. That fact that that column cannot be duplicated without violating that constraint means the whole row will be unique and therefore not a duplicate. I don't understand what you're getting at here. [quoted text, click to view] >A table is a set of facts. If your table has no key (= UNIQUE or PRIMARY >KEY) except on an IDENTITY column then you have potential redundancy and >almost certainly actual redundancy because you repeat the same fact(s) on >multiple rows. The redundancy serves no useful purpose and has the drawbacks >I already outlined: storage cost, bandwidth and logical integrity and >complexity problems. > >
I don't believe the inclusion of an IDENTITY column is supposed to solve the redundant natural data problem. I'm not really sure how to unequivocally solve that one unless you create a unique index that includes every column in the table perhaps - maybe that's a problem for a different layer of the model. Perhaps there's just not enough data about the entity being captured (as Anith pointed out, in the real world there's always some attribute that distinguishes instances of an entity) and if captured it would, maybe in combination with the other facts about the entity, provide a means of uniquely identifying the row. <shrug> The way I see it is that in this scenario the IDENTITY column provides a convenient way of referencing a row, which, with the exception of the identity column, may contain the exact same data as another row in the same table (maybe because not enough facts about the entity are captured). [quoted text, click to view] >You may believe that there is "information" in the IDENTITY column which >makes it useful as an entity integrity constraint. I say there isn't or >there shouldn't be. One of the fundamental reasons we use IDENTITY columns >at all is because they are assumed to be immutable BECAUSE they are >meaningless. Immutability is a desirable property for a primary key because >it is referenced many times by foreign keys. Usually we don't expose >IDENTITY values to end users at all because we know that if we do so they >will acquire business meaning and therefore they may be forced to change. We >also know that if we migrate data between systems we may want to change the >surrogate keys on INSERT as part of the integration process and we can only >safely do that if IDENTITY columns don't have any meaning that is exposed to >the user. Finaly, if we did expose meaning in an IDENTITY column we could >have big problems because we can't even UPDATE an IDENTITY column in SQL >Server. > >
I agree that an identity column should be meaningless and should only be used as a way of uniquely referencing a row that would otherwise be very difficult or inconvenient to reference (but that's normally a good reason for them). [quoted text, click to view] >So given that it would be unsafe and undesirable to expose any meaning in an >IDENTITY column we always have and should enforce another key. For example >we don't want to see this: > >CREATE TABLE T1 (x INT IDENTITY PRIMARY KEY, z VARCHAR(10) NOT NULL /* No >key! */) > >INSERT INTO T1 (z) VALUES ('ABC') >INSERT INTO T1 (z) VALUES ('ABC') -- !! duplicate >INSERT INTO T1 (z) VALUES ('ABC') -- !! duplicate > >This would be a disaster if you have a foreign key referencing the IDENTITY >column. The user can't be sure which entities he is updating because he only >sees the business key that has meaning in the real world. Probably he'll >erroneously add referencing rows more than once because they get split >between the different parent rows. Then someone changes the z value on one >row and some of the related data is left behind still attached to the same z >value on another row. Remember, as far as the end user is concerned, z is >the ONLY key. > >
Agreed, but in this case I think there's just not enough data captured in that table. If you included an entry "timestamp" or some other fact about an instance of the 'ABC' data then the user could pick out which instance he wants. Alternately just put a UNIQUE constraint on the z column to ensure you don't get in that situation (but that's your point isn't it? I'm a little lost with your reasoning). [quoted text, click to view] >It gets worse. Let's add another table and more sample data: > >CREATE TABLE T2 (x INT NOT NULL REFERENCES T1 (x), z1 VARCHAR(10) NOT NULL, >PRIMARY KEY (x,z1)) > >INSERT INTO T1 (z) VALUES ('DEF') > >/* I've had to assume the parent keys that I'm referencing below: */ >INSERT INTO T2 (x,z1) VALUES (2,'P') >INSERT INTO T2 (x,z1) VALUES (3,'Q') >INSERT INTO T2 (x,z1) VALUES (4,'R') > >As an exercise, try migrating this data into an identical pair of tables T3 >and T4: > >CREATE TABLE T3 (x INT IDENTITY PRIMARY KEY, z VARCHAR(10) NOT NULL) >CREATE TABLE T4 (x INT NOT NULL REFERENCES T1 (x), z1 VARCHAR(10) NOT NULL, >PRIMARY KEY |