Groups | Blog | Home
all groups > sql server (alternate) > march 2004 >

sql server (alternate) : database design


maellic2002 NO[at]SPAM yahoo.fr
3/14/2004 9:57:53 PM
Hi,


I'm fairly familiar with database normalisation rules but I can't
really figure out what I should do with the specific design I'm
dealing with.

My users will need to enter data for several facilities each year. So
I have a main table:
tblFacilityReportingYear
-----------------------
FRP_ID*
FacilityID
ReportingYear

Facilities have sources that are estimated using a certain
technique(EETCode) (and both sources and codes might change from year
to year) so I have :

tblFRP_Sources
--------------
FRPSources_ID*
FRP_ID (FK)
SourceName
SourceType
EETCode

Now the difficult part: Each of these sources has a different set of
attributes depending on the source type. For example, the facility
might have a certain number of bulldozers (DozerActivity) and graders
(GraderActivity, AverageSpeedOfGrader).

I could put it all in tblFRP_Sources:
Design 1
*********

tblFRP_Sources
--------------
FRPSources_ID*
FRP_ID (FK)
SourceName
SourceType
EETCode
DozerActivity
GraderActivity
AverageSpeedOfGrader

but that doesn't seem right. There would be a lot of empty fields.

Or I could have a table for each source type:
Design 2
*******

tblFRP_Sources
--------------
FRPSource_ID*
FRP_ID (FK)
SourceName
SourceType
EETCode

tblFRP_Bulldozers
----------------
FRPSource_ID*
DozerActivity

tblFRP_Graders
--------------
FRPSource_ID*
GraderActivity
AverageSpeedOfGrader

This seems cleaner but then it's a lot of one to one relationships
between tblFRP_Sources and each source type table. Still not perfect.

Another solution would be to have:
Design 3
********

tblSourceData
------------
DataItem_ID *
Description
SourceType (FK)


tblFRP_SourceData
-----------------
FRPSourceData_ID *
FRPSource_ID (FK)
DataItem_ID (FK)
Value

Seems much cleaner but I can see several problems with this approach:
1 - Harder to code I think (I'll have an ASP.NET front end). I don't
have the option of using dynamic forms (as in a form that would just
display a list of fields required for a specific source). I need to
display certain fields in a certain location, others in another...
2 - What about data items that contain other data items. For example I
also have a vehicles source which has a number of km travelled by
vehicle type. Doesn't fit in design 3 but easy to do with design 2
(just add another table with a one to many relationship).
3 - What about data items that are not all numeric values ?

So I'm a bit stuck. The last design example seems to be the more
normalised (but I may be wrong here and please tell me if I am) but
the design 2 seems to be the most practical.

I'll be going with design 2 but if anybody who has had the patience to
read this post and think about it (thanks a lot for that) disagrees,
I'll be keen to hear reasons why I should be doing different.

Thanks

Anith Sen
3/15/2004 6:29:07 AM
You may want to google for sub-type & super-type relationships among tables
which is a 0/1:1 (zero-or-one-to-one) referential constraint which is
generally recommended for such requirements.

In some cases, one may use nullable columns in SQL for such requirements,
esp. when the data designer wants to eliminate several joins across
different tables.

--
Anith

Hugo Kornelis
3/15/2004 12:43:21 PM
[quoted text, click to view]

I know you didn't ask, but I'd advise to remove FRP_ID and make the
combination of FacilityID and ReportingYear the primary key. As a rule
of thumb, an artificial key should generally not be introduced when a
natural key is available (and there are only very few situations with
no natural key available). Notable exceptions to this rule of thumb
occur when the natural key spans many columns or is a long (var)char
column. Neither is the case here.

From your description, I'm not entirely sure which columns make out
the natural primary key for tblFRP_Sources, but I think there's a high
probability that you don't need the FRPSources_ID columne either.

(snip until real question)

[quoted text, click to view]

(snip description of problems)

I'd strongly recommend against putting metadata in your tables. It can
be very complicated, very confusing and it tends to introduce more
problems than it may solve. Further, this is not a properly normalized
solution for your client's actual information need. (It is -the
beginning of- a properly normalized solution for storing metadata, but
that's not what your client wants).

[quoted text, click to view]

See above :-)

[quoted text, click to view]

Either design 1 or design 2 can be used. That's mostly a matter of
personal preference.

Design 1 (one table with many rows) takes more space, but as the
prices of harddisks have already fallen below the price of a sliced
bread, I can't imagine that to be an issue. Since all data is in one
table, you'd need less joins to produce reports. The queries in design
2 will be longer (unless you define a view to join all tables into a
big view holding all data).

If your application is time-critical and the amount of data is huge,
you might want to try both approaches and see which performs the best.
It's hardly possible to predict which will win. The extra joins of
design 2 will introduce some overhead, but the larger rowsize of
design 1 will result in more physical I/O to get the same amount of
data. You really need to experiment if performance is an issue.
Otherwise, just choose from design 1 and design 2 the one that suits
you best.

Best, Hugo
--

maellic2002 NO[at]SPAM yahoo.fr
3/15/2004 3:54:56 PM
Thanks for both responses.

Two points:

* I looked up subtype/supertype as was suggested by Anith and saw that
the two different approaches of keeping things in one big table or
splitting it up in multiple tables were both valid (as Hugo also
confirmed). Most of the time I will need to access just one of those
subtypes, example just the bulldozers, so I think it makes a little
bit more sense to have them in different tables. (also it's also quite
clearer to understand for the developer rather than a huge table).
However when I need to access all the data for my total calculations,
I might actually follow Hugo's suggestion to use a view that joins all
the tables.

I'm not very familiar with views and although I've researched it, I
still haven't really understood the difference between them and stored
procedures. What's the difference physically on SQL server, and also
logically ?

I designed a very similar database with Access and I know that when I
tried joining all the tables in a query, Access run out of memory
because they were too many tables in my join. Is that likely to happen
as well with SQL Server ?

* Second point is the natural key versus artificial key argument. I've
seen a lot of conflicting opinions on this. I can see that using
natural keys is going to avoid joins: as an e.g if I want to get all
sources from a facility in a certain year, if I use an artificial key
I will have to join tblFRPSources to tblFacilityReportingYear, however
if I use a natural key I can just query tblFRPSources. But then, why
do a lot of people swear by using only artificial keys ?

Anyway, thanks a lot for your input. I've designed a few of theses
databases and I always seem to go through the same dilemnas.. Only IT
person at work so there's not really anybody to discuss these things
with. It's quite nice to find out how other people would deal with the
same problems and why.


[quoted text, click to view]
Erland Sommarskog
3/15/2004 11:19:35 PM
Maellic (maellic2002@yahoo.fr) writes:
[quoted text, click to view]

The last issue could be address with sql_variant.

There are situations where this kind of solution is the only viable
way to go. As you say, the coding is more difficult, but only
initially. If new items comes each week, all you need to do is to
add some more data to the table. With the other solutions you have
run ALTER TABLE each week.

But if the structure is decently static, then this is a pain, so starting
with the second alternative is probably the best way to go, at least
as the list of objects to support is a closed set.




--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
louisducnguyen NO[at]SPAM hotmail.com
3/16/2004 10:31:48 AM
[quoted text, click to view]

Artificial/surrogate keys are generally recommended when the natural
key contains too many columns. e.g. if a table has 10 columns and the
natural key consists of all 10 cols. If you make the 10 cols the PK,
you effectively doubled the space requirements for the table. As the
clustered index is now almost the same size as the data. Of course,
this is a limitation of the DB software. Future releases of MS Sql
Server might make this a moot point. As surrogate vs. natural keys
Hugo Kornelis
3/16/2004 10:59:25 AM
[quoted text, click to view]

Beware though, that to get all data from the main table joined with
all subtables, you'd need to use either OUTER JOIN or UNION. If you
try it with INNER JOIN, the result set will be empty.

[quoted text, click to view]

A stored procedure is basically a (usually small) program that is
stored on the server and executed on demand. It may do anything: show
data, change data, access other servers, drop or create database
objects, commit or rollback transactions or even shut down your
server. Some stored procedures return data, others don't.

A view always consists of one select statement. The SQL for that
statement is stored in the server as the view definition. Now, the
view can be used anywhere you'd normally use a table: you can select
all or part of the data from the view and in many cases, you can
insert, update and delete data through the view. Since the view has
it's own authorization settings, it's a useful method to allow people
access to some but not all data in a table. A common example is the
table with personnel data. The manager of the sales department is
entitled to see data of "his" employees, but nothing else. And even
for "his" employees, he is not entitled to see salary and some other
data. This can be done with a view:

CREATE VIEW Sales_Personnel AS
SELECT EmpID, FirstName, LastName, Dept, Address, HireDate
FROM Employees
WHERE Dept = 'Sales'
GO

You can now remove all rights of the Sales manager to the Employees
table but give him access rights to this view instead. This would
allow him to see only the columns mentioned of only the employees in
his own department.

Neither a view nor a stored procedure result in extra actual user data
being stored somewhere. Both are executed when needed to produce the
results from the tables used in either the stored procedure's code or
in the view definition.

[quoted text, click to view]

You can push any database over it's limit if you try hard enough, but
SQL Server really needs a whole lot of pushing before it's limits are
reached.

[quoted text, click to view]

My main argument against unneeded artifical keys is not avoiding joins
(though that can be a useful side effect - as well as improving
performance on lots of other queries as well), but user friendliness.
Why would you introduce a new naming scheme on a user if (s)he's
perfectly happy with the current naming scheme?

Suppose the French mail delivery service introduced artificial keys as
a replacement for city names. How would you like it if you could not
simply jot "Marseille" or "Paris" on an envelope but would have to
look up these names in a long list and write "18" or "1" on the
envelope instead?

[quoted text, click to view]

<soapbox>

Maybe they don't think. Or they think in old-fashioned ways. Or they
think that something that is really useful (sometimes even necessary)
in some cases and doesn't hurt *them* in other cases might as well be
used in all cases. Or -most probably- they do it because everybody
else does it.

It's a bit like starting all table names with "tbl", like you do. Why
do you do it? Are you afraid that when you get back to a working query
later, you might be in doubt whether the name after "from" would be a
table, a column or a database name?

</soapbox>

Best, Hugo
--

maellic2002 NO[at]SPAM yahoo.fr
3/16/2004 5:46:18 PM
<snip>

[quoted text, click to view]

Can't you do the same with a stored procedure though ? Remove rights
from the table but give execute rights to the stored procedure?

If I understand you, there is no difference between a view and a
stored procedure that has a select statement - apart from maybe
different security settings ? Unless I'm getting confused: when I used
access, I could define queries. And a query could access another query
as if it was a table. I thought stored procedures on sql server were
just a more powerful kind of query. I thought you could actually
access a stored procedure as if it was a table, just like in access
you could access a query as if it was a table. Is that where I'm
getting confused ?

[quoted text, click to view]

I guess I'll find out, eh?

[quoted text, click to view]

Makes sense if you let the user see your keys but in my case, in all
databases I've developed, the keys were just an internal thing, the
user didn't need to see them.


[quoted text, click to view]

Good point. But there is a difference between me and these guys
though: I don't go telling people over newsgroups they SHOULD be using
names with "tbl". I know it's just a habit probably left over from uni
days and whenever I think about a table I think tbl*something. And
because it doesn't hurt, I'm not overly worried about using it or not
using it as long as I'm consistent. But I'm certainly not saying
that's it's a better way.


Erland Sommarskog
3/16/2004 10:49:01 PM
Maellic (maellic2002@yahoo.fr) writes:
[quoted text, click to view]

Well, it *could* happen, but SQL Server is an enterprise DBMS, and you
would have to be really mean with it for something like that to happen.
SQL Server can handle up to 256 tables in a query.

[quoted text, click to view]

There are natural keys and there are natural keys.

Say that you have a database about buying and seeling financial instruments.
Three concepts in such a database is customer, instrument and position.

How do we indentify a customer? Well, in many countries they have some
national registration number that all people have one of, could we use
that? But it happens that people change numbers, and you may foreigners
that don't have any. And some people feel that their integrity is at
stake with their number everywhere. So in the end, an articifical customer
number is better. Probably one that is generated so that it controls a
check digit, as this number is likely to be shown externally.

Instruments then? Well, all financial instruments are supposed to have
an ISINcode, a 12-character code with two letters (country code) and 10
digits. Perfect natural key? Nah, it happens that instruments changes
its code for some reason. More importantly, it turns out that not all
instruments have an ISIN-code. They might be temporary instruments as
part of a corporate action like a rights issue. But it could also be
any sort of goods that we for some reason like to register. So again,
we need an artificial key. But in this case, we can keep that code
internally, because instruments can be decently identified by the users
by their names. (In fact, in our system there has to be a unique name).

Finally a position. If customer 4711 buys 30 MSFT, that gives us a
position. What is the natural key for that position? CustomerNumber and
InstrumentID of course. That is, while artificial keys in itself,
they can still be natural keys when you model some composite object.
Having a PositionID as an artificial key would be bad design.

Occasionally, when the natural key has many columns, and you still
need to refer to this table, by yet a subtable, adding a shortcut
key may be a good idea. But in such case, don't forget to add a
UNIQUE constraint for the real key.

--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
Hugo Kornelis
3/17/2004 10:34:38 AM
[quoted text, click to view]

<snip>
[quoted text, click to view]

You can do that with a stored procedure, but you'll lack the
flexibility of a view.


[quoted text, click to view]

The stored procedure canm only be executed. The view can be used in
another query to select only rows meeting specified criteria from the
result set. And a view can (in most cases) be updated as if it were a
regular table.

[quoted text, click to view]

Yes. What Access calls a query is what every other database (some
would write "every database") calls a view. A stored procedure is
roughly equivalent to what Access calls a module.

Best, Hugo
--

AddThis Social Bookmark Button