[quoted text, click to view] >>We still need a URL to answer the perennial "what's the difference between columns and fields?" question. <<
That is one of my standard rants, which I have posted for years:
In 25 words or less, a column is active and a field is passive.
Columns have datatypes, defaults and constraints; they can be
materialized or virtual. Fields have meaning given to them by the
host program using thme and must be materialized.
Here is the long version:
Like most new ideas, the hard part of understanding what the
relational model is comes in un-learning what you know about file
systems. As Artemus Ward (William Graham Sumner, 1840-1910) put it,
"It ain't so much the things we don't know that get us into trouble.
It's the things we know that just ain't so."
If you already have a background in data processing with traditional
file systems, the first things to un-learn are:
(0) Databases are not file sets.
(1) Tables are not files.
(2) Rows are not records.
(3) Columns are not fields.
Modern data processing began with punch cards. The influence of the
punch card lingered on long after the invention of magnetic tapes and
disk for data storage. This is why early video display terminals
were 80 columns across. Even today, files which were migrated from
cards to magnetic tape files or disk storage still use 80 column
records.
But the influence was not just on the physical side of data
processing. The methods for handling data from the prior media were
imitated in the new media.
Data processing first consisted of sorting and merging decks of punch
cards (later, sequential magnetic tape files) in a series of distinct
steps. The result of each step feed into the next step in the
process. This leads to temp table and other tricks to mimic that kind
of processing.
Relational databases do not work that way. Each user connects to the
entire database all at once, not to one file at time in a sequence of
steps. The users might not all have the same database access rights
once they are connected, however. Magnetic tapes could not be shared
among users at the same time, but shared data is the point of a
database.
Tables versus Files
A file is closely related to its physical storage media. A table may
or may not be a physical file. DB2 from IBM uses one file per table,
while Sybase puts several entire databases inside one file. A table
is a <i>set<i> of rows of the same kind of thing. A set has no
ordering and it makes no sense to ask for the first or last row.
A deck of punch cards is sequential, and so are magnetic tape files.
Therefore, a <i>physical<i> file of ordered sequential records also
became the <i>mental<i> model for data processing and it is still hard
to shake. Anytime you look at data, it is in some physical ordering.
The various access methods for disk storage system came later, but
even these access methods could not shake the mental model.
Another conceptual difference is that a file is usually data that
deals with a whole business process. A file has to have enough data
in itself to support applications for that business process. Files
tend to be "mixed" data which can be described by the name of the
business process, such as "The Payroll file" or something like that.
Tables can be either entities or relationships within a business
process. This means that the data which was held in one file is often
put into several tables. Tables tend to be "pure" data which can be
described by single words. The payroll would now have separate tables
for timecards, employees, projects and so forth.
Tables as Entities
An entity is physical or conceptual "thing" which has meaning be
itself. A person, a sale or a product would be an example. In a
relational database, an entity is defined by its attributes, which are
shown as values in columns in rows in a table.
To remind users that tables are sets of entities, I like to use
collective or plural nouns that describe the function of the entities
within the system for the names of tables. Thus "Employee" is a bad
name because it is singular; "Employees" is a better name because it
is plural; "Personnel" is best because it is collective and does not
summon up a mental picture of individual persons.
If you have tables with exactly the same structure, then they are sets
of the same kind of elements. But you should have only one set for
each kind of data element! Files, on the other hand, were PHYSICALLY
separate units of storage which could be alike -- each tape or disk
file represents a step in the PROCEDURE , such as moving from raw
data, to edited data, and finally to archived data. In SQL, this
should be a status flag in a table.
Tables as Relationships
A relationship is shown in a table by columns which reference one or
more entity tables. Without the entities, the relationship has no
meaning, but the relationship can have attributes of its own. For
example, a show business contract might have an agent, an employer
and a talent. The method of payment is an attribute of the contract
itself, and not of any of the three parties.
Rows versus Records
Rows are not records. A record is defined in the application program
which reads it; a row is defined in the database schema and not by a
program at all. The name of the field in the READ or INPUT statements
of the application; a row is named in the database schema. Likewise,
the PHYSICAL order of the field names in the READ statement is vital
(READ a,b,c is not the same as READ c, a, b; but SELECT a,b,c is the
same data as SELECT c, a, b.
All empty files look alike; they are a directory entry in the
operating system with a name and a length of zero bytes of storage.
Empty tables still have columns, constraints, security privileges and
other structures, even tho they have no rows.
This is in keeping with the set theoretical model, in which the empty
set is a perfectly good set. The difference between SQL's set model
and standard mathematical set theory is that set theory has only one
empty set, but in SQL each table has a different structure, so they
cannot be used in places where non-empty versions of themselves could
not be used.
Another characteristic of rows in a table is that they are all alike
in structure and they are all the "same kind of thing" in the model.
In a file system, records can vary in size, datatypes and structure by
having flags in the data stream that tell the program reading the data
how to interpret it. The most common examples are Pascal's variant
record, C's struct syntax and Cobol's OCCURS clause.
The OCCURS keyword in Cobol and the Variant records in Pascal have a
number which tells the program how many time a record structure is to
be repeated in the current record.
Unions in 'C' are not variant records, but variant mappings for the
same physical memory. For example: