all groups > sql server programming > february 2006 >
You're in the

sql server programming

group:

SS05: Identity col always PK ?



SS05: Identity col always PK ? John A Grandy
2/14/2006 8:47:14 PM
sql server programming: In SS05 , is a col config'd as Identity datatype necessarily the PK ?

If not , then for a table with an Identity col defined , how to set the PK
in SS Mgmt Studio ?

Re: SS05: Identity col always PK ? --CELKO--
2/14/2006 10:07:06 PM
BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER,
NEVER a primary. Did you sleep thru RDBMS 101?

Get the books you never read, and look up what a key is. It is to be a
subset of the attributes of an entity -- this is by definition!

What you are doing is confusing RDBMS with a 1950's magnetic tape file
systems, where you located a record (NOT a row!! Totally different
concept!) in a sequence. Just like a record number, so is the IDENTITY
column. A totally non-relational PHYSICAL locator based on when a
PHYSICAL record (which models but is not the samew as a row) is
inserted into a table.

[quoted text, click to view]

Who cares?? You are not not a real SQL programmer!! You are a
"mousey, mousey, click , click" non-programmer. (with a French accent)
we spit on you, Video gamer!

to be serious, real programmers use a text editor. They know the
language they write in. Those stinking "video game tools"slow us down.
And they lead us to ask questiosn like this in newsgroups where people
liek me will maek fun of you.
RE: SS05: Identity col always PK ? Shaju
2/14/2006 10:18:26 PM
Hi,

The identity datatype column can never be primary key column. We have to
define it if required ti be a primary key column by creating a primary key
constraint.

To Set a primary key. Select the table rigt click and choose modify. It
would open the description of the table. select colum right click and select
set to primary key option. This is how to do through sql server management
studio.

We can do this programmatically also.

HTH

Thanks & Regards
Shaju

[quoted text, click to view]
Re: SS05: Identity col always PK ? John A Grandy
2/14/2006 11:00:39 PM
Joe ... please remember: 40mg valium ever 4 hours, like the doc told you. I
know it's real fun up in the rdbms theory ozone layer, but your cardiac
health is far more important.

"Mousey mousey click click" IDEs are taking over the world 'cause companies
like mine incinerate any and all competition trying to write apps in pure
code. If you don't like the new reality, go argue with Scott Gu , not me.


So how would approach the need for a PK on the following table :

USE [GUI]
GO
/****** Object: Table [dbo].[Tasks] Script Date: 02/14/2006 22:53:57
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Tasks](
[TaskID] [int] IDENTITY(1,1) NOT NULL,
[TaskDate] [datetime] NOT NULL,
[TaskDescription] [varchar](2000) COLLATE
SQL_Latin1_General_CP1_CI_AS NOT NULL CONSTRAINT [DF_Tasks_TaskName]
DEFAULT (''),
[TaskTimeInHours] [decimal](3, 1) NOT NULL CONSTRAINT
[DF_Tasks_TimeInHours] DEFAULT (0)
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF


And yes I do read Ken Henderson whenever I'm on the can, like a good lad.



[quoted text, click to view]

Re: SS05: Identity col always PK ? John A Grandy
2/14/2006 11:06:29 PM
Yes, this is exactly what I attempted: I right-clicked on the Identity col
and I see a list of its properties (including "Primary key") -- but all
properties in the list are disabled : they're all greyed-out and can't be
edited.


[quoted text, click to view]

Re: SS05: Identity col always PK ? David Portas
2/14/2006 11:41:13 PM
[quoted text, click to view]

Do not use the GUI to make changes in a production environment.
Especially don't make changes to IDENTITY columns this way. The safe
and professional way is to use scripts.

[quoted text, click to view]

How can we say since we know nothing of your data or business? If there
is no other key but IDENTITY then you have done something seriously
wrong. IDENTITY should never be the only key of a table.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? John A Grandy
2/14/2006 11:44:12 PM
Ahh, then right click on the Identity col ... that did the trick. Thanks.


[quoted text, click to view]

Re: SS05: Identity col always PK ? Mike Hodgson
2/15/2006 12:00:00 AM
[quoted text, click to view]
What about a conceptual table like a table of tasks in a project plan
for instance? What would be other candidate keys for a "task"? (I'm
not pushing any particular view, I'm just trying to think of situations
where there is no "natural" key for a table.)

--
*mike hodgson*
Re: SS05: Identity col always PK ? Mike Hodgson
2/15/2006 12:00:00 AM
[quoted text, click to view]
Bravo! Let him have it (I think Joe was harsher than normal in his
post). But seriously, 40mg every 4 hours? Are you trying to kill him?
I don't know anything about valium really but I suspect such a large,
regular dose is likely to be worse for his cardiovascular system than
just simply leaving his keyboard for a couple hours & going for a walk
in the park.

--
*mike hodgson*
Re: SS05: Identity col always PK ? Mike Hodgson
2/15/2006 12:00:00 AM
Ease up a bit Joe. That was harsh even by your usual standards. And,
from memory, the John Cleese quote from The Holy Grail was "I fart in
your general direction!" but it has been a while since I last saw it so
I could be mistaken.

BTW, hard-core assembly programmers (not me - I hated assembly at Uni)
might argue that people who don't program in machine code or assembly
but use 3GLs, like C for instance, are not real programmers either.
It's all a matter of perspective.

--
*mike hodgson*
http://sqlnerd.blogspot.com



[quoted text, click to view]
Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 12:00:00 AM
Hi David,

What about a message board?

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Uri Dimant
2/15/2006 12:00:00 AM
John
Right Click on the table and then Modify





[quoted text, click to view]

Re: SS05: Identity col always PK ? Erland Sommarskog
2/15/2006 12:00:00 AM
John A Grandy (johnagrandy-at-yahoo-dot-com) writes:
[quoted text, click to view]

I echo what David said, about not using the point-and-click GUI for changing
databases in a production environment. There are *serious* shortcomings
in the Modify Table function in Mgmt Studio and Enterprise Manager, and
you are putting your database at stake if you make some changes from the
GUI and press save. You can generate a script, carefully review it and make
some required changes.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
Re: SS05: Identity col always PK ? Brian Selzer
2/15/2006 12:00:00 AM
[quoted text, click to view]

Since an edge is an abstract concept, it is not known before hand why there
are more than one edge between verticies. Consider connections between two
network routers. One is a DS0 (56KB leased), and the other uses an analog
POTS modem. They both provide a connection, but the DS0 line costs more
than the POTS line, and the DS0 line provides more bandwidth than the analog
line. You need a place to record things like cost, bandwidth, etc. It
makes sense to have separate edges per entity in this case because there can
be many unrelated differentiating criteria--all of which are concrete.

[quoted text, click to view]

That's the point, these tables are surrogate generators, and can be used for
any or all graphs in a database, enabling any application or even database
constraints to use the same set of functions. How many times have you
written code to make sure that a node wasn't already part of a tree? How
many times have you written queries against heirarchies? I say: do it once,
bullet-proof it, optimize it, and then use it any time you need to represent
any type of graph.

[quoted text, click to view]

Re: SS05: Identity col always PK ? John A Grandy
2/15/2006 12:45:36 AM
I have no other business-process relevant columns I can use for a PK. The
DateTime table col translates in the real world to date only, no time. The
hours worked nor the description of the job performed also can not be used
to construct a PK.

Only a non-BP relevant entity can be constructed to serve as the PK. For a
PK, I can either Identity the entire table, or combine an increment-by-1 for
each new date.

What else is there?


[quoted text, click to view]

Re: SS05: Identity col always PK ? John A Grandy
2/15/2006 1:01:06 AM
Nah, 10mg an hour at worst will just make him a sleepy dope. No real damage
done.

Joe, maybe you should have a friendly word with Bill and Steve before they
toss a couple billion at yet another mousey-mousey click-click oui-oui ... ?


[quoted text, click to view]
Joe ... please remember: 40mg valium ever 4 hours, like the doc told you. I
know it's real fun up in the rdbms theory ozone layer, but your cardiac
health is far more important.

Bravo! Let him have it (I think Joe was harsher than normal in his post).
But seriously, 40mg every 4 hours? Are you trying to kill him? I don't
know anything about valium really but I suspect such a large, regular dose
is likely to be worse for his cardiovascular system than just simply leaving
his keyboard for a couple hours & going for a walk in the park.

--
mike hodgson
http://sqlnerd.blogspot.com

Re: SS05: Identity col always PK ? John A Grandy
2/15/2006 1:09:06 AM
We use sql change-scripts tested againt multiple dev and test envs before
applying the ddl to prod dbs.


[quoted text, click to view]

Re: SS05: Identity col always PK ? David Portas
2/15/2006 1:56:21 AM
[quoted text, click to view]

Depends what information you want to record for a task. I designed a
database for a PM application where the key was something like
(parent_task_no, task_type). Task_name would presumably be another
potential candidate.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? David Portas
2/15/2006 2:06:10 AM
[quoted text, click to view]

Hi Brian,

I don't really buy this example. Why wouldn't you model multiple edges
between vertices as:

.... number_of_edges INT NOT NULL (CHECK number_of_edges >0),
UNIQUE (InitialVertex, TerminalVertex))

?

A Vertex presumably represents something other than just a surrogate
key otherwise what is this a graph of?

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? David Portas
2/15/2006 2:09:05 AM
[quoted text, click to view]

Wrong question. Why would you want to store duplicate (=redundant) data
in this table? A more fundamental question than "what is the key?" is
"what information are you trying to record about this entity?".

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? David Portas
2/15/2006 2:48:32 AM
[quoted text, click to view]

Good question. Perhaps (thread_id, user_id, post_datetime)? From a
purist point-of-view you might want to add message_body to that key but
chances are your maximum message size will break SQL Server's 900 byte
key size limit. In practice you obviously want a key that SQL Server
can enforce so some compound key including a DATETIME seems like the
obvious candidate.

In 2005 we can now easily generate hashes in the database which means
we can work around the product limitation whereby uniqueness
constraints cannot be enforced against all datatypes.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? Brian Selzer
2/15/2006 4:36:02 AM
There is only one exception to the rule that I follow when using =
IDENTITY or surrogates in general. Unless the table is completely =
abstract in the model, then there should be at least one candidate key =
in addition to the surrogate. If any column in a table can hold =
information that originates outside of the database, then that table =
isn't completely abstract. Any other use of IDENTITY embeds part of the =
application in the database. =20

Here's an example of completely abstract tables:

CREATE TABLE Vertexes=20
(
Vertex INT IDENTITY(-2147483647, 1)=20
PRIMARY KEY CLUSTERED
)
CREATE TABLE Edges
(
Edge INT IDENTITY(-2147483647, 1)=20
PRIMARY KEY CLUSTERED,
InitialVertex INT NOT NULL
REFERENCES Vertexes (Vertex),
TerminalVertex INT NOT NULL
REFERENCES Vertexes (Vertex)
)

These tables can be used as surrogate generators for concrete tables =
that represent information that can be modeled as any type of =
graph--that is, a tree, heirarchy, network, etc. A unique constraint on =
Edges(InitialVertex, TerminalVertex) is contraindicated because a =
multigraph allows for multiple edges between verticies. Now you can =
write a single set of table-valued and scalar functions to find the =
degree of a vertex; the shortest path between two verticies; left, right =
or undirected components; etc. These can then be applied to simplify =
queries against concrete tables, such as finding the minimum cost =
process for producing a part, or finding the minimum time process for =
producing the same part. In addition, you may also be able to pin the =
tables in memory to speed up those types of queries.




[quoted text, click to view]
David Portas wrote:=20
How can we say since we know nothing of your data or business? If there
is no other key but IDENTITY then you have done something seriously
wrong. IDENTITY should never be the only key of a table.
What about a conceptual table like a table of tasks in a project plan =
for instance? What would be other candidate keys for a "task"? (I'm =
not pushing any particular view, I'm just trying to think of situations =
where there is no "natural" key for a table.)

--
mike hodgson
Re: SS05: Identity col always PK ? David Portas
2/15/2006 6:50:43 AM
[quoted text, click to view]

Yes of course you can receive duplicate data on any logical key you
care to choose. The question is what does the customer demand from the
database in that eventuality? Even assuming you want to display all the
duplicate discussion posts in your app there is absolutely no reason to
store redundant rows in the database. Your example problem therefore
isn't really answerable without more information about the business
requirements regarding duplicate posts. (An example of why newgroups
aren't a good place for design discussions - it's unfeasibly hard to
define the problem domain up front.)

Redundancy gives you nothing but it costs you storage, bandwidth and
integrity. Those are the immediate practical costs of leaving out real
keys.

With SQL Server's IDENTITY column implementation there are other
special considerations. Some problems just don't have feasible
set-based solutions unless you have enforced a natural key in addition
to the IDENTITY key. A classic example is how to populate a table that
has a self-referencing foreign key. Another example is how to integrate
data from two data sources in tables with foreign keys. It seems to me
that's a big price to pay for leaving out a few UNIQUE constraints.

[quoted text, click to view]

Use HashBytes() instead.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? John A Grandy
2/15/2006 7:37:25 AM
Each day, each worker has a number of tasks they are required to enter
description and hours to complete. It's a timesheet except that it's
irrelevant what time they started//finished any given task.

Think of billing a client .... the client only cares that you're charging
them 6 hours to do X and what day was the work performed on. They don't care
what time you started, finished, how many breaks you took, how long, etc.


[quoted text, click to view]

Re: SS05: Identity col always PK ? David Portas
2/15/2006 7:59:51 AM
[quoted text, click to view]

So it sounds like (taskdate, taskdescription) should be the key. Or
should client be in there as well? You didn't post the rest of the
schema but if you are recording the client for each task then surely
you must have another table also keyed on taskid because the client
doesn't appear at all in this table? So that may explain why you missed
the key - the schema is somehow denormalized.

The timesheet I use has a pre-defined set of task descriptions to
choose from. I would be inclined to add a key for description and put
that in its own table so that you can report on common tasks across all
clients. I'll bet the client doesn't read a 2KB description for every
task either. Surely a more compact descirption or code for each task
would make sense.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? JT
2/15/2006 9:15:36 AM
Tasks are typically assigned or entered by a specific person at a =
specific time. So EmployeeID + EntryDate would be a natural key.
[quoted text, click to view]
David Portas wrote:=20
How can we say since we know nothing of your data or business? If there
is no other key but IDENTITY then you have done something seriously
wrong. IDENTITY should never be the only key of a table.
What about a conceptual table like a table of tasks in a project plan =
for instance? What would be other candidate keys for a "task"? (I'm =
not pushing any particular view, I'm just trying to think of situations =
where there is no "natural" key for a table.)

--
mike hodgson
Re: SS05: Identity col always PK ? David Portas
2/15/2006 9:48:06 AM
[quoted text, click to view]

Logical keys don't have to prevent duplicate posts. If it's a
requirement to show duplicate posts from separate servers then
presumably you have a one-to-many relationship between the Posts table
(WITH it's logical key) and the server identifier (GUID or whatever).
No logical reason to denormalize the GUID into the Posts table. All the
logical keys do is eliminate *redundancy*, which is just fine because
redundancy gives you nothing extra.

[quoted text, click to view]

That's exactly what it does do actually. Hashing isn't the same as
encryption or checksums. The point of the secure hashes is that they
can be safely assumed to be unique for a given domain of inputs. Unless
you hash obscenely large amounts of arbitrary binary data you won't get
hash colisions within the lifetime of the system. Not with
VARCHAR/NVARCHAR data anyway. This is an efficient and totally reliable
way to compare large data sets and even whole documents.

[quoted text, click to view]

The only examples I've come across where I'll agree with you are
staging or logging tables. If you have some external feed over which
you don't have full control then you may want to stage the data using
only generated keys. You could view this as an ETL problem of course
and say that it's the job of your data integration tier to normalize
the data before it hits the database. For many good business reasons
however, we often tackle this as ELT and so we temporarily have to work
without keys. Typically we throw away the staging data afterwards and
work with the good data model that we've derived from it.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? Mark Williams
2/15/2006 10:19:31 AM

--


[quoted text, click to view]

Tony, they are the same because of your variable declaration of nvarchar.
Since you didn't specify a length, it will default to 1 character. Your
@HashThis1 and @HashThis2 variables both start with 't', so they will have
the same hash value.

To distinct strings producing the same hash value (called a collision) are
very, very rare, but they can happen.

Re: SS05: Identity col always PK ? David Portas
2/15/2006 10:22:57 AM
There are a few examples of MD5 collisions and you can Google for them
but they rely on specially contrived inputs and lots of computing
power. The number of possible outputs is 128 bits which is an
astronomically large number. There is a function to calculate the
probability of collisions in a domain of given size but I don't know it
off hand. Perhaps Steve Kass will jump in with it. For most practical
purposes we can ignore those problems but SHA1 is anyway generally
believed to be more reliable.

[quoted text, click to view]

You can't get away from the fact that redundancy is redundant!

CREATE TABLE MessageBoard (username VARCHAR(10), message VARCHAR(100),
/*...etc, etc */ number_of_times_posted INTEGER NOT NULL CHECK
(number_of_times_posted>0), PRIMARY KEY (username,message))

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? Mark Williams
2/15/2006 10:31:29 AM
[quoted text, click to view]

From the Handbook of Applied Cryptography, MD5 has a collision probability
of 2^64; and that figure reflects someone's chances of they are trying to
produce a collision based on knowledge of how MD5 works. The odds that two
Re: SS05: Identity col always PK ? Anith Sen
2/15/2006 12:03:01 PM
[quoted text, click to view]

I think you are confusing an entity's representation in the database with
its existence in reality. There is no such thing as absolute duplicates in
reality.

If they are *absolute duplicates* you wouldn't be able to distinguish them
in reality. Whether it is a newsgroup post or a star in the sky, if there is
true duplication how would you even know it? It is self-contradictory.

A logical representation captures certain attributes of an entity in a
database. There the question is whether a given representation captures all
essential attributes such that one entity is distinguishable from another.
If they do not, you'll end up with duplicates in a table that can only be
distinguished by physical means.

--
Anith

Re: SS05: Identity col always PK ? David Portas
2/15/2006 1:02:07 PM
[quoted text, click to view]

You are absolutely right that a hash is a surrogate key. I've nothing
against surrogate keys. The hash is a handy workaround for the SQL
Server limitation that prevents us creating unique constraints on large
columns. It is just a method of enforcing entity integrity. On the
other hand an IDENTITY column does NOT enforce entity integrity at all.
An IDENTITY key does NOT prevent duplicates - it just enumerates them
in a table.

I've nowhere said that you shouldn't use IDENTITY as a key. You are
right that it is surely going to be desirable to do so in this case. By
all means use IDENTITY as a *surrogate* key. But never as the ONLY key
of a table - that way lies disaster. I hope you agree with me after
all.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 2:05:15 PM
Using the message body is unpractical, even post subject is probably
impractical too.

Consider these forums - user_id is email address (though not unique, many
users may post under the same annonymous account), thread is
microsoft.public.sqlserver.programming (but is it? thats just the posting
server, you can post to the same message on mulitple servers - demon,
compuserve etc...)

Consider a new release of a service pack that causes problem, its
conceiveable though unlikely you will get two different posters at exactly
the same time with the same subject.

Its more practical to use a generated primary key, for nntp its a guid that
includes the posting domain server name.

I think my point is - in real world, once you come to implement a design you
have to be practical; using NEWID() or the IDENTITY property in this case is
more than justifiable and makes complete sense.

You can use CHECKSUM on the body to create a number, but thats not unique.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 4:52:49 PM
[quoted text, click to view]

Use this news group as a test case, all posts get in regardless; so they
need to be globally keyed across all possible replicated servers.

They only way to do that is to have a global unique identifier.

If you where to allow duplicate logical keys (which you would have to) then
the message forum would not work - you can not truely identify which post a
user is replying to if duplicates exists.

By creating a system generated identifier using NEWID() aka GUID then you
don't have that problem, the post goes to the proper thread and parent post.

[quoted text, click to view]

All you are doing is encrypting the data, that does not help you generate a
logical key that is unique.

So my question is - do you not think this is one of many situations where no
practical logical key exists, remember - i'm talking about unique key, which
by definition is what a key is afterall.

Tony.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 5:14:38 PM
Gosh, I will go forth and use NOTEPAD.EXE forever more.

I just hope that my clients will pay me double because the extra time it
takes to do anything.

No longer can I simply look at a graphical representation of a query plan I
have go out to the DOS prompt and get the text version and work out what all
this is about.

Your time machine seems to be malfunctioning, this is 2006 and not 1980! Go
and get a job as a junior programmer and get some very needed industry and
current toolset experience.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 5:55:35 PM
DECLARE @HashThis1 nvarchar;
SELECT @HashThis1 = CONVERT(nvarchar,'tony rogerson');
SELECT HashBytes('MD5', @HashThis1);

DECLARE @HashThis2 nvarchar;
SELECT @HashThis2 = CONVERT(nvarchar,'tony rogerson reply');
SELECT HashBytes('MD5', @HashThis2);

if @HashThis1 = @HashThis2
print 'same'

Perhaps I am missing something, the above, although different inputs give
the same hash.

I'm not talking duplicate posts from seperate servers, I'm talking about the
same server.

You can't get away from the fact that a message board, unless you restrict a
users input, does not have a natural usable key, you have to create your
own.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 6:09:21 PM
The example I gave (copied from BOL) is wrong - its wrong in BOL, it should
use nvarchar(max) and not nvarchar; thats what happens when you trust the
examples!

Should be...

DECLARE @HashThis1 nvarchar(max);
SELECT @HashThis1 = CONVERT(nvarchar(max),'tony rogerson');
SELECT HashBytes('MD5', @HashThis1);

DECLARE @HashThis2 nvarchar(max);
SELECT @HashThis2 = CONVERT(nvarchar(max),'tony rogerson reply');
SELECT HashBytes('MD5', @HashThis2);

if @HashThis1 = @HashThis2
print 'same'

Which works.

With HashBytes you are going to get duplicates, by the very nature of
reducing data down to a hash will give you inputs on a large enough data
source.

Do you have figures for the number of 4K rows that would cause a duplicate
to occur?

By creating the hash in this way you are creating a surrogate key I
suppose - but the fact that duplicates can still occur i just do not like.

Again, it still a scenerio where there is no reliable natural key present
that could give you gaurenteed uniqueness.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Brian Selzer
2/15/2006 7:36:34 PM
Nah, I don't think that people who use 3GLs aren't real =
programmers--almost all of the work I do now is in higher-level =
languages: I think that people who can't read assembly code aren't real =
programmers. Sometimes you have to step through code in order to find =
bugs, and that's pretty difficult if you don't have a clue about how =
native code executes, especially if the problem only appears in the =
release version where all of the debugging information has been stripped =
out. You can never take anything for granted. Over the years I've =
found several bugs in commercial compilers, and it's nice to be able to =
prove to your boss that the reason it took you two days to fix a problem =
is that there's a bug in the compiler.
[quoted text, click to view]
Ease up a bit Joe. That was harsh even by your usual standards. And, =
from memory, the John Cleese quote from The Holy Grail was "I fart in =
your general direction!" but it has been a while since I last saw it so =
I could be mistaken.

BTW, hard-core assembly programmers (not me - I hated assembly at Uni) =
might argue that people who don't program in machine code or assembly =
but use 3GLs, like C for instance, are not real programmers either. =
It's all a matter of perspective.

--
mike hodgson
http://sqlnerd.blogspot.com=20



--CELKO-- wrote:=20
BY DEFINITION an IDENTITY columns can NEVER, NEVER, NEVER, NEVER,
NEVER a primary. Did you sleep thru RDBMS 101?

Get the books you never read, and look up what a key is. It is to be a
subset of the attributes of an entity -- this is by definition!

What you are doing is confusing RDBMS with a 1950's magnetic tape file
systems, where you located a record (NOT a row!! Totally different
concept!) in a sequence. Just like a record number, so is the IDENTITY
column. A totally non-relational PHYSICAL locator based on when a
PHYSICAL record (which models but is not the samew as a row) is
inserted into a table.

how to set the PK in SS Mgmt Studio ? <<
=20
Who cares?? You are not not a real SQL programmer!! You are a
"mousey, mousey, click , click" non-programmer. (with a French accent)
we spit on you, Video gamer!

to be serious, real programmers use a text editor. They know the
language they write in. Those stinking "video game tools"slow us down.
And they lead us to ask questiosn like this in newsgroups where people
Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 8:35:10 PM
So you limit the mesage to 100 bytes, or 800 bytes? That isn't practical,
consider the message which is now probably 4KB.

If you use a hash, you are generating your own surrogate key.

You cannot use a primary key of username, message out in the real world.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 8:39:16 PM
Anith, I am talking about the implementation of in this instance the message
board as a table.

It needs a primary key, David tells us to use username, message which just
isn't practical because that limits you to 900 bytes which wouldn't be a
very good message board.

You need to generate your own primary key - simple as that and no matter how
its dressed up that is the reality.

In NNTP they use a GUID which includes the domain name.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? Tony Rogerson
2/15/2006 9:16:47 PM
Nope, still don't agree.

I agree that IDENTITY property does not guarentee uniquess, the UNIQUE
constraint or PRIMARY KEY serves that purpose.

The IDENTITY property serves as a method to create a unique number, just
like the HashBytes serves to create a number (note, I refrain from saying
unique).

If you are going to use a key based on the IDENTITY property then you will
put a UNIQUE constraint, or if you are using it because there is no natural
key present a PRIMARY KEY.

I don't see the point of hashing, it just causes a lot of unecessary CPU
cycles when you have something you can use already, the hash is not humanly
useable but a simply 4 byte number is - we've all been there when
developing, looking up ID's etc...

And, although its a high probability there is still the chance that with
SHA1 you will still get a duplicate, for instance - if two people post a
blank message, which happens from time to time (i've just checked my SQL
copy of these news groups).

So, going back to the CREATE TABLE; do you therefore advocate no PRIMARY KEY
on the message board table, just a surrogate based on the SHA1 hash?

Last point, what happens if the message is changed? In that case the hash
will change and you are stuffed - you run into the concurrency problem with
using the natural key everywhere as the foreign key instead of using a
surrogate.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


[quoted text, click to view]

Re: SS05: Identity col always PK ? David Portas
2/15/2006 10:43:47 PM
[quoted text, click to view]

I didn't make myself very clear. When I said:

"An IDENTITY key doesn't prevent duplicates"

what I meant was:

"A column with the IDENTITY property and with a UNIQUE or PRIMARY KEY
constraint does not prevent duplicates"

A table is a set of facts. If your table has no key (= UNIQUE or PRIMARY
KEY) except on an IDENTITY column then you have potential redundancy and
almost certainly actual redundancy because you repeat the same fact(s) on
multiple rows. The redundancy serves no useful purpose and has the drawbacks
I already outlined: storage cost, bandwidth and logical integrity and
complexity problems.

You may believe that there is "information" in the IDENTITY column which
makes it useful as an entity integrity constraint. I say there isn't or
there shouldn't be. One of the fundamental reasons we use IDENTITY columns
at all is because they are assumed to be immutable BECAUSE they are
meaningless. Immutability is a desirable property for a primary key because
it is referenced many times by foreign keys. Usually we don't expose
IDENTITY values to end users at all because we know that if we do so they
will acquire business meaning and therefore they may be forced to change. We
also know that if we migrate data between systems we may want to change the
surrogate keys on INSERT as part of the integration process and we can only
safely do that if IDENTITY columns don't have any meaning that is exposed to
the user. Finaly, if we did expose meaning in an IDENTITY column we could
have big problems because we can't even UPDATE an IDENTITY column in SQL
Server.

So given that it would be unsafe and undesirable to expose any meaning in an
IDENTITY column we always have and should enforce another key. For example
we don't want to see this:

CREATE TABLE T1 (x INT IDENTITY PRIMARY KEY, z VARCHAR(10) NOT NULL /* No
key! */)

INSERT INTO T1 (z) VALUES ('ABC')
INSERT INTO T1 (z) VALUES ('ABC') -- !! duplicate
INSERT INTO T1 (z) VALUES ('ABC') -- !! duplicate

This would be a disaster if you have a foreign key referencing the IDENTITY
column. The user can't be sure which entities he is updating because he only
sees the business key that has meaning in the real world. Probably he'll
erroneously add referencing rows more than once because they get split
between the different parent rows. Then someone changes the z value on one
row and some of the related data is left behind still attached to the same z
value on another row. Remember, as far as the end user is concerned, z is
the ONLY key.

It gets worse. Let's add another table and more sample data:

CREATE TABLE T2 (x INT NOT NULL REFERENCES T1 (x), z1 VARCHAR(10) NOT NULL,
PRIMARY KEY (x,z1))

INSERT INTO T1 (z) VALUES ('DEF')

/* I've had to assume the parent keys that I'm referencing below: */
INSERT INTO T2 (x,z1) VALUES (2,'P')
INSERT INTO T2 (x,z1) VALUES (3,'Q')
INSERT INTO T2 (x,z1) VALUES (4,'R')

As an exercise, try migrating this data into an identical pair of tables T3
and T4:

CREATE TABLE T3 (x INT IDENTITY PRIMARY KEY, z VARCHAR(10) NOT NULL)
CREATE TABLE T4 (x INT NOT NULL REFERENCES T1 (x), z1 VARCHAR(10) NOT NULL,
PRIMARY KEY (x,z1))

Your goal is obviously to preserve the relationship between the rows in T1
and T2. The first part seems to be easy:

INSERT INTO T3 (z)
SELECT z
FROM T1;

But how do you insert the rows from T2? Assume there was already data in T3
so you are now working with different IDENTITY keys. If you'd had an
alternate key your task would be simply accomplished with two insert
statements. But you don't have a real key at all here. Your options are
limited. If the system is idle you could use an intermediate table to
generate surrogate keys then INSERT them using IDENITY_INSERT. That's messy
and slow and not easy if the system is in use. You could always use a cursor
I suppose. OR... you could eliminate the redundancy that should not have
been there in the first place.

Now you may feel you can live with wasted storage and with slow and complex
queries but why would you want to pay that price?


[quoted text, click to view]

That's why I added the column called number_of_times_posted. Add as many
duplicate messages as you like and just incerement the count for each one.
That was precisely the point I was trying to make with my CREATE TABLE
statement but I didn't make it very well.


[quoted text, click to view]

No. Absolutely not. I put that example together very quickly and carelessly.
If there were referencing foreign keys then I'd use an IDENTITY PRIMARY KEY
but WITH a UNIQUE constraint on the hash.

Working partial examples like this isn't so very instructive however. The
proper way to approach the problem is from design principles:

IDENTITY PRIMARY KEY is a good surrogate key for enforcing referential
integrity against.

IDENTITY PRIMARY KEY is transparently worse than useless for the purposes of
entity integrity - for that you need constraints on natural key columns.

Usually entity integriry constraints take the form of a UNIQUE constraint on
some of those columns. In the special case where the key is too wide you can
use a surrogate hash key with a unique constraint, which achieves the same
effect. I believe the cases where the hash solution is appropriate and
necessary are rare but I accept that they do exist and the hash seems to be
an elegant solution.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--




Re: SS05: Identity col always PK ? Mike Hodgson
2/16/2006 12:00:00 AM
[quoted text, click to view]
Um...doesn't a UNIQUE or PRIMARY KEY (which, by definition, implies
UNIQUE) constraint guarantee the absence of duplicates. That fact that
that column cannot be duplicated without violating that constraint means
the whole row will be unique and therefore not a duplicate. I don't
understand what you're getting at here.

[quoted text, click to view]
I don't believe the inclusion of an IDENTITY column is supposed to solve
the redundant natural data problem. I'm not really sure how to
unequivocally solve that one unless you create a unique index that
includes every column in the table perhaps - maybe that's a problem for
a different layer of the model. Perhaps there's just not enough data
about the entity being captured (as Anith pointed out, in the real world
there's always some attribute that distinguishes instances of an entity)
and if captured it would, maybe in combination with the other facts
about the entity, provide a means of uniquely identifying the row.
<shrug> The way I see it is that in this scenario the IDENTITY column
provides a convenient way of referencing a row, which, with the
exception of the identity column, may contain the exact same data as
another row in the same table (maybe because not enough facts about the
entity are captured).

[quoted text, click to view]
I agree that an identity column should be meaningless and should only be
used as a way of uniquely referencing a row that would otherwise be very
difficult or inconvenient to reference (but that's normally a good
reason for them).

[quoted text, click to view]
Agreed, but in this case I think there's just not enough data captured
in that table. If you included an entry "timestamp" or some other fact
about an instance of the 'ABC' data then the user could pick out which
instance he wants. Alternately just put a UNIQUE constraint on the z
column to ensure you don't get in that situation (but that's your point
isn't it? I'm a little lost with your reasoning).

[quoted text, click to view]