Is there a good approach to modelling many heterogeneous entity types with that have some attributes in common? Say I have entities "employees" which share some attibutes (e.g. firstname, lastname, dateofbirth) but some subsets of employees (e.g. physicians, janitors, nurses, ambulance drivers) may have additional attributes that do not apply to all employees. Physicians may have attributes specialty and date of board certification, ambulance drivers may have a drivers license id, janitors may have preferredbroomtype and so on. There are many employee subtypes and more can be dynamically added after the application is deployed so it's obviously no good to keep adding attributes to the employees table because most attributes will be NULL (since janitors are never doctors at the same time). The only solution I found for this is a generalization hiearchy where you have the employee table with all generic attributes and then you add tables for each new employee subtype as necessary. The subtype tables share the primary key of the employee table. The employee table has a "discriminator" field that allows you to figure out which subtype table to load for a particular entity. This solution does not seem to scale since for each value of "discriminator" I need to perform a join with a different table. What if I need to retrieve 1,000 employees at once? Is that possible to obtain a single ResultSet with one SQL statement SQL? Or do you I need to iterate look at the discriminator and then perform the appropriate join? If this kind of iteration is necessary then obviously this generalization hierarchy approach does not work in practice since it would be painfully slow. Is there a better approach to modelling these kind of heterogeneous entities with shared attributes that does not involve creating a table for each new employee type or having sparce tables (mostly filled with NULLS) I guess another approach would be to use name/value pairs but that would make reporting really ugly. Seems like a very common problem. Any ideas? Is this a fundamental limitation of SQL? Thanks!
[quoted text, click to view] "Robert Brown" <robertbrown1971@yahoo.com> wrote in message news:240a4d09.0405270856.4ce55c7d@posting.google.com... > Is there a good approach to modelling many heterogeneous entity types > with that have some attributes in common?
This is a frequently asked question. Unfortunately, the frequently given responses don't usually settle the matter. [quoted text, click to view] > There are many employee subtypes and more can be dynamically added > after the application is deployed so it's obviously no good to keep > adding attributes to the employees table because most attributes will > be NULL (since janitors are never doctors at the same time). > > The only solution I found for this is a generalization hiearchy where > you have the employee table with all generic attributes and then you > add tables for each new employee subtype as necessary. The subtype > tables share the primary key of the employee table. The employee table > has a "discriminator" field that allows you to figure out which > subtype table to load for a particular entity.
If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a janitor or a doctor) and you invent new tables for subtypes on an equally ad hoc basis, I'd suggest that you are altering your data model on an ad hoc basis. You can do that if you want, but don't expect the same power and simplicity that you get from a stable data model. [quoted text, click to view] > > This solution does not seem to scale since for each value of > "discriminator" I need to perform a join with a different table. What > if I need to retrieve 1,000 employees at once?
Why would you ever need the attributes of more than one subtype? If your query is about doctors, why would you need to join the table about janitors into the query? If your query is about employees, why would you need to know any of the subtype attributes? [quoted text, click to view] > > Or do you I need to iterate look at the discriminator and then > perform the appropriate join? If this kind of iteration is necessary > then obviously this generalization hierarchy approach does not work in > practice > since it would be painfully slow.
You need to ask two questions about the model, and about any imlpementation based on the model: first, is it correct and complete? Second, if yes to the first, is it fast enough? In general, different implementations of the same model should be transparent to users of the imlpementation who only look at the features visible in the model. [quoted text, click to view] > > Is there a better approach to modelling these kind of heterogeneous > entities with shared attributes that does not involve creating a table > for each new employee type or having sparce tables (mostly filled with > NULLS)
If you have new entities, you are going to have new relations. That either means having new tables, or fudging the relationship between tables and rleations. Do the second one at your own peril. [quoted text, click to view] > Seems like a very common problem. Any ideas? Is this a fundamental > limitation of SQL?
I would suggest it goes beyond SQl to the very heart of using foreign key/ primary key aossciations to establish linkages.
[quoted text, click to view] "Robert Brown" <robertbrown1971@yahoo.com> wrote in message news:240a4d09.0405270856.4ce55c7d@posting.google.com... > adding attributes to the employees table because most attributes will > be NULL (since janitors are never doctors at the same time).
By the way, after I wrote "a DBA is never a janitor or a doctor" it occurred to me that when I was a DBA there were days when I felt like I was both at the same time! ;)
[quoted text, click to view] "Laconic2" <laconic2@comcast.net> wrote in message news:<E8KdnXHBZNMsuSvdRVn-jg@comcast.com>... > "Robert Brown" <robertbrown1971@yahoo.com> wrote in message > news:240a4d09.0405270856.4ce55c7d@posting.google.com... > > Is there a good approach to modelling many heterogeneous entity types > > with that have some attributes in common? > > This is a frequently asked question. Unfortunately, the frequently given > responses don't usually settle the matter.
Could you please post links to any relevant past threads if you remember where to find them?I tried to search the news group for "generalization hierarchy" but did not find anything useful. Is there a better search term I can use? [quoted text, click to view] > > > There are many employee subtypes and more can be dynamically added > > after the application is deployed so it's obviously no good to keep > > adding attributes to the employees table because most attributes will > > be NULL (since janitors are never doctors at the same time). > > > > The only solution I found for this is a generalization hiearchy where > > you have the employee table with all generic attributes and then you > > add tables for each new employee subtype as necessary. The subtype > > tables share the primary key of the employee table. The employee table > > has a "discriminator" field that allows you to figure out which > > subtype table to load for a particular entity. > > If you invent new subtypes on an adhoc basis, (e.g. a DBA is never either a > janitor or a doctor) > and you invent new tables for subtypes on an equally ad hoc basis, I'd > suggest that you are altering your data model on an ad hoc basis. > > You can do that if you want, but don't expect the same power and simplicity > that you get from a stable data model. > > > > > > > This solution does not seem to scale since for each value of > > "discriminator" I need to perform a join with a different table. What > > if I need to retrieve 1,000 employees at once? > > Why would you ever need the attributes of more than one subtype? If your > query is about doctors, why would you need to join the table about janitors > into the query?
Because I need to display users and their heterogenious attributes in the UI as one list (e.g. in response to a search). [quoted text, click to view] > If your query is about employees, why would you need to know any of the > subtype attributes?
Same reason, our customer's HR department does not see the employees as separate entities. To them an employee is an employee and they want to see all the associated data while browsing employees. I guess you could show the generic attributes on the first pass and then require the end user to click to drill down and see the subtype attributes - but that's not what they want. [quoted text, click to view] > > > > Or do you I need to iterate look at the discriminator and then > > perform the appropriate join? If this kind of iteration is necessary > > then obviously this generalization hierarchy approach does not work in > > practice > > since it would be painfully slow. > > You need to ask two questions about the model, and about any imlpementation > based on the model: > first, is it correct and complete? Second, if yes to the first, is it fast > enough? > > In general, different implementations of the same model should be > transparent to users of the imlpementation who only look at the features > visible in the model. > > > > > > Is there a better approach to modelling these kind of heterogeneous > > entities with shared attributes that does not involve creating a table > > for each new employee type or having sparce tables (mostly filled with > > NULLS) > > If you have new entities, you are going to have new relations. That either > means having new tables, or fudging the relationship between tables and > rleations. Do the second one at your own peril. > > > Seems like a very common problem. Any ideas? Is this a fundamental > > limitation of SQL? > > I would suggest it goes beyond SQl to the very heart of using foreign key/
Robert Brown (robertbrown1971@yahoo.com) writes: [quoted text, click to view] > The only solution I found for this is a generalization hiearchy where > you have the employee table with all generic attributes and then you > add tables for each new employee subtype as necessary. The subtype > tables share the primary key of the employee table. The employee table > has a "discriminator" field that allows you to figure out which > subtype table to load for a particular entity. > > This solution does not seem to scale since for each value of > "discriminator" I need to perform a join with a different table. What > if I need to retrieve 1,000 employees at once?
So what? 1000 rows for a modern RDBMS is a breeze. [quoted text, click to view] > Is that possible to obtain a single ResultSet with one SQL statement > SQL?
Yes, although with many discriminators, it will be one hell of a join: SELECT main.col1, main.col2, ... d1.col1, d1.col2, ... FROM main LEFT JOIN discriminator d1 ON d1.keycol = main.keycol LEFT JOIN discriminator d2 ON d2.keycol = main.keycol ... In practice, things might be even messier, because some values might apply to four discriminators, but be irrelevant to the rest. In they the would be four different columns. Although, this could be addressed with: common_to_four = colaesce(d3.common, d6.common, d9.common, d11.common) But it may be better to add this as a nullable column to the common table. Likewise, if two discrimiators are very similar, it may be better to lump them in the same table. [quoted text, click to view] > I guess another approach would be to use name/value pairs but that > would make reporting really ugly.
And you get less control over your spelling errors. But sometimes this is the way to go. [quoted text, click to view] > Seems like a very common problem. Any ideas? Is this a fundamental > limitation of SQL?
Sort of. SQL tables are squared, and object-oriented hierachies are jagged. But that's alright, just don't be too object-oriented. Be pragmatic too. -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at
[quoted text, click to view] In the last exciting episode, "Laconic2" <laconic2@comcast.net> wrote: > "Robert Brown" <robertbrown1971@yahoo.com> wrote in message > news:240a4d09.0405270856.4ce55c7d@posting.google.com... > >> adding attributes to the employees table because most attributes will >> be NULL (since janitors are never doctors at the same time). > > By the way, after I wrote "a DBA is never a janitor or a doctor" it > occurred to me that when I was a DBA there were days when I felt > like I was both at the same time! ;)
Some days it also combines coroner and roto-rooter operator... A couple coworkers were in that state today :-(. -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org'). http://www3.sympatico.ca/cbbrowne/advocacy.html If a person with multiple personalities threatens suicide, is that
This is always messy. With dynamically added subtypes, I would consider keeping everything soft. Here is a first approximation for the tables: Employee Valid Subtypes Attributes/Subtype Valid Attributes Subtype<--------Subtype--------->Subtype ?Datatype Employee # ?Desc Attribute<-----------Attribute | ?Desc | Employee Attributes Employee # Attribute Attribute data You could then build a list of available attributes for an employee based on his subtype from the attributes/subtype table. The data entry screen(s) could be built from the Valid Attributes table(depending on how smart it needs to be). This is a bit rough, but I hope the idea comes through. Control over the Valid Subtypes table, the Attributes table, etc. would need to be defined (who and how), but, gosh, I would have to have to create a new table every time a new subtype was added. LOL.. here I go again, ask me what time it is and I tell you how to build a watch. Hope this helps more than it hurts. Payson [quoted text, click to view] robertbrown1971@yahoo.com (Robert Brown) wrote in message news:<240a4d09.0405270856.4ce55c7d@posting.google.com>... > Is there a good approach to modelling many heterogeneous entity types > with that have some attributes in common? > > Say I have entities "employees" which share some attibutes (e.g. > firstname, lastname, dateofbirth) but some subsets of employees (e.g. > physicians, janitors, nurses, ambulance drivers) may have additional > attributes that do not apply to all employees. Physicians may have > attributes specialty and date of board certification, ambulance > drivers may have a drivers license id, janitors may have > preferredbroomtype and so on. > > There are many employee subtypes and more can be dynamically added > after the application is deployed so it's obviously no good to keep > adding attributes to the employees table because most attributes will > be NULL (since janitors are never doctors at the same time). > > The only solution I found for this is a generalization hiearchy where > you have the employee table with all generic attributes and then you > add tables for each new employee subtype as necessary. The subtype > tables share the primary key of the employee table. The employee table > has a "discriminator" field that allows you to figure out which > subtype table to load for a particular entity. > > This solution does not seem to scale since for each value of > "discriminator" I need to perform a join with a different table. What > if I need to retrieve 1,000 employees at once? > > Is that possible to obtain a single ResultSet with one SQL statement > SQL? > > Or do you I need to iterate look at the discriminator and then > perform the appropriate join? If this kind of iteration is necessary > then obviously this generalization hierarchy approach does not work in > practice > since it would be painfully slow. > > Is there a better approach to modelling these kind of heterogeneous > entities with shared attributes that does not involve creating a table > for each new employee type or having sparce tables (mostly filled with > NULLS) > > I guess another approach would be to use name/value pairs but that > would make reporting really ugly. > > Seems like a very common problem. Any ideas? Is this a fundamental > limitation of SQL? > > Thanks! >
[quoted text, click to view] >> Is there a good approach to modelling many heterogeneous entity
types with that have some attributes in common? << Ignoring that the phrase "entity type" is OO **yuck!** The classic scenario calls for a root class with all the common attributes and then specialized sub-classes under it. As an example, let's take the class of Vehicles and find an industry standard identifier (VIN), and add two mutually exclusive sub-classes, Sport utility vehicles and sedans ('SUV', 'SED'). CREATE TABLE Vehicles (vin CHAR(17) NOT NULL PRIMARY KEY, vehicle_type CHAR(3) NOT NULL CHECK(vehicle_type IN ('SUV', 'SED')), UNIQUE (vin, vehicle_type), ..); Notice the overlapping candidate keys. I then use a compound candidate key (vin, vehicle_type) and a constraint in each sub-class table to assure that the vehicle_type is locked and agrees with the Vehicles table. Add some DRI actions and you are done: CREATE TABLE SUV (vin CHAR(17) NOT NULL PRIMARY KEY, vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL CHECK(vehicle_type = 'SUV'), UNIQUE (vin, vehicle_type), FOREIGN KEY (vin, vehicle_type) REFERENCES Vehicles(vin, vehicle_type) ON UPDATE CASCADE ON DELETE CASCADE, ..); CREATE TABLE Sedans (vin CHAR(17) NOT NULL PRIMARY KEY, vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL CHECK(vehicle_type = 'SED'), UNIQUE (vin, vehicle_type), FOREIGN KEY (vin, vehicle_type) REFERENCES Vehicles(vin, vehicle_type) ON UPDATE CASCADE ON DELETE CASCADE, ..); I can continue to build a hierarchy like this. For example, if I had a Sedans table that broke down into two-door and four-door sedans, I could a schema like this: CREATE TABLE Sedans (vin CHAR(17) NOT NULL PRIMARY KEY, vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL CHECK(vehicle_type IN ('2DR', '4DR', 'SED')), UNIQUE (vin, vehicle_type), FOREIGN KEY (vin, vehicle_type) REFERENCES Vehicles(vin, vehicle_type) ON UPDATE CASCADE ON DELETE CASCADE, ..); CREATE TABLE TwoDoor (vin CHAR(17) NOT NULL PRIMARY KEY, vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL CHECK(vehicle_type = '2DR'), UNIQUE (vin, vehicle_type), FOREIGN KEY (vin, vehicle_type) REFERENCES Sedans(vin, vehicle_type) ON UPDATE CASCADE ON DELETE CASCADE, ..); CREATE TABLE FourDoor (vin CHAR(17) NOT NULL PRIMARY KEY, vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL CHECK(vehicle_type = '4DR'), UNIQUE (vin, vehicle_type), FOREIGN KEY (vin, vehicle_type) REFERENCES Sedans (vin, vehicle_type) ON UPDATE CASCADE ON DELETE CASCADE, ..); The idea is to build a chain of identifiers and types in a UNIQUE() constraint that go up the tree when you use a REFERENCES constraint. Obviously, you can do variants of this trick to get different class structures. If an entity doesn't have to be exclusively one subtype, you play with the root of the class hierarchy: CREATE TABLE Vehicles (vin CHAR(17) NOT NULL, vehicle_type CHAR(3) NOT NULL CHECK(vehicle_type IN ('SUV', 'SED')), PRIMARY KEY (vin, vehicle_type), ..); Now start hiding all this stuff in VIEWs immediately and add an INSTEAD OF trigger to those VIEWs. Here is the link on Amazon.com for my new book on "Trees & Hierarchies in SQL"
--CELKO-- (jcelko212@earthlink.net) writes: [quoted text, click to view] > CREATE TABLE Vehicles > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) NOT NULL > CHECK(vehicle_type IN ('SUV', 'SED')), > UNIQUE (vin, vehicle_type), > ..); > > Notice the overlapping candidate keys. I then use a compound candidate > key (vin, vehicle_type) and a constraint in each sub-class table to > assure that the vehicle_type is locked and agrees with the Vehicles > table. Add some DRI actions and you are done:
Ah! Interesting trick! I'll save a mental note about that. -- Erland Sommarskog, SQL Server MVP, sommar@algonet.se Books Online for SQL Server SP3 at
[quoted text, click to view] jcelko212@earthlink.net (--CELKO--) wrote in message news:<18c7b3c2.0405291007.6ac72984@posting.google.com>... > >> Is there a good approach to modelling many heterogeneous entity > types with that have some attributes in common? << > > Ignoring that the phrase "entity type" is OO **yuck!** > > The classic scenario calls for a root class with all the common > attributes and then specialized sub-classes under it. As an example, > let's take the class of Vehicles and find an industry standard > identifier (VIN), and add two mutually exclusive sub-classes, Sport > utility vehicles and sedans ('SUV', 'SED'). > > CREATE TABLE Vehicles > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) NOT NULL > CHECK(vehicle_type IN ('SUV', 'SED')), > UNIQUE (vin, vehicle_type), > ..); > > Notice the overlapping candidate keys. I then use a compound candidate > key (vin, vehicle_type) and a constraint in each sub-class table to > assure that the vehicle_type is locked and agrees with the Vehicles > table. Add some DRI actions and you are done: > > CREATE TABLE SUV > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL > CHECK(vehicle_type = 'SUV'), > UNIQUE (vin, vehicle_type), > FOREIGN KEY (vin, vehicle_type) > REFERENCES Vehicles(vin, vehicle_type) > ON UPDATE CASCADE > ON DELETE CASCADE, > ..); > > CREATE TABLE Sedans > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL > CHECK(vehicle_type = 'SED'), > UNIQUE (vin, vehicle_type), > FOREIGN KEY (vin, vehicle_type) > REFERENCES Vehicles(vin, vehicle_type) > ON UPDATE CASCADE > ON DELETE CASCADE, > ..); > > I can continue to build a hierarchy like this. For example, if I had > a Sedans table that broke down into two-door and four-door sedans, I > could a schema like this: > > CREATE TABLE Sedans > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL > CHECK(vehicle_type IN ('2DR', '4DR', 'SED')), > UNIQUE (vin, vehicle_type), > FOREIGN KEY (vin, vehicle_type) > REFERENCES Vehicles(vin, vehicle_type) > ON UPDATE CASCADE > ON DELETE CASCADE, > ..); > > CREATE TABLE TwoDoor > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL > CHECK(vehicle_type = '2DR'), > UNIQUE (vin, vehicle_type), > FOREIGN KEY (vin, vehicle_type) > REFERENCES Sedans(vin, vehicle_type) > ON UPDATE CASCADE > ON DELETE CASCADE, > ..); > > CREATE TABLE FourDoor > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL > CHECK(vehicle_type = '4DR'), > UNIQUE (vin, vehicle_type), > FOREIGN KEY (vin, vehicle_type) > REFERENCES Sedans (vin, vehicle_type) > ON UPDATE CASCADE > ON DELETE CASCADE, > ..); > > The idea is to build a chain of identifiers and types in a UNIQUE() > constraint that go up the tree when you use a REFERENCES constraint. > Obviously, you can do variants of this trick to get different class > structures. > > If an entity doesn't have to be exclusively one subtype, you play with > the root of the class hierarchy: > > CREATE TABLE Vehicles > (vin CHAR(17) NOT NULL, > vehicle_type CHAR(3) NOT NULL > CHECK(vehicle_type IN ('SUV', 'SED')), > PRIMARY KEY (vin, vehicle_type), > ..); > > Now start hiding all this stuff in VIEWs immediately and add an > INSTEAD OF trigger to those VIEWs. > > Here is the link on Amazon.com for my new book on "Trees & Hierarchies > in SQL" > > http://www.amazon.com/exec/obidos/tg/detail/-/1558609202/qid=1080772873/sr=1-1/ref=sr_1_1/102-7683601-6345721?v=glance&s=books#product-details Good stuff, for one thing I don't get it, Why would one need NOT NULL in the following DDL, [quoted text, click to view] > CREATE TABLE Sedans > (vin CHAR(17) NOT NULL PRIMARY KEY, > vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL > CHECK(vehicle_type IN ('2DR', '4DR', 'SED')), > UNIQUE (vin, vehicle_type), > FOREIGN KEY (vin, vehicle_type) > REFERENCES Vehicles(vin, vehicle_type) > ON UPDATE CASCADE > ON DELETE CASCADE, > ..);
since by definition PRIMARY KEY cannot be null, so, why would I tell people, "I'm an American and an American citizen", instead of just
[quoted text, click to view] <ctcgag@hotmail.com> wrote in message news:20040602125733.647$6f@newsreader.com... > robertbrown1971@yahoo.com (Robert Brown) wrote: > If the main or only goal is to satisfy nosy Nellies, I would either go > with a auxilliary table of name-value pairs (and another auxilliary table > with listing allowed or suggested attributes for each job type, or just > do away with all those tables and make a "free text" clob column on the > employee table (or call it XML rather than free text, if that type of > things turns you on).
Yeah, right, make it XML, and watch your performance going down the drain. If you want "flexible" structure, use auxilliary name-value table. Convert it into nested collection if you have object fever.
[quoted text, click to view] daaa@rock.com (DaaaDaaa) wrote in message news:<2193afb.0406011916.5abb2052@posting.google.com>... > > Good stuff, for one thing I don't get it, > Why would one need NOT NULL in the following DDL, > > CREATE TABLE Sedans > > (vin CHAR(17) NOT NULL PRIMARY KEY, > > vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL > > CHECK(vehicle_type IN ('2DR', '4DR', 'SED')), > > UNIQUE (vin, vehicle_type), > > FOREIGN KEY (vin, vehicle_type) > > REFERENCES Vehicles(vin, vehicle_type) > > ON UPDATE CASCADE > > ON DELETE CASCADE, > > ..);
Methinks he means theoretically the primary key could be separate from the vin definition. Just because Oracle disallows it doesn't mean it is redundant. I don't know about ANSI or other db's offhand. [quoted text, click to view] > since by definition PRIMARY KEY cannot be null, so, why would I tell > people, "I'm an American and an American citizen", instead of just > say, "I'm an American"?
'Cause you are not these guys: http://www.signonsandiego.com/news/metro/20040511-9999-1m11deport.html http://www.signonsandiego.com/uniontrib/20040529/news_1n29deport.html Obviously, the attributes of American are cultural, and of a citizen are legal. A citizen could be raised abroad and not consider themselves American, and it is possible to have null citizenship. http://www.keepmedia.com/ShowItemDetails.do?itemID=326142&extID=10032&oliID=213 http://www.hrw.org/reports/1995/Kuwait.htm jg -- @home.com is bogus.
[quoted text, click to view] robertbrown1971@yahoo.com (Robert Brown) wrote: > > > > > > This solution does not seem to scale since for each value of > > > "discriminator" I need to perform a join with a different table. What > > > if I need to retrieve 1,000 employees at once? > > > > Why would you ever need the attributes of more than one subtype? If > > your query is about doctors, why would you need to join the table > > about janitors into the query? > > Because I need to display users and their heterogenious attributes in > the UI as one list (e.g. in response to a search).
Uh, that isn't really an answer. [quoted text, click to view] > > If your query is about employees, why would you need to know any of the > > subtype attributes? > > Same reason, our customer's HR department does not see the employees > as separate entities. To them an employee is an employee and they want > to see all the associated data while browsing employees. I guess you > could show the generic attributes on the first pass and then require > the end user to click to drill down and see the subtype attributes - > but that's not what they want.
It sounds to me like the real solution is to fire your HR department and replace them with people who just do their jobs, rather than browsing around on the other employees information out of idle curiousity. If the main goal is to provide custom interfaces, one per job-type, that support job-specific functions--and satisfying nosy Nellies is only secondary--then I think I would go with the main-table / multiple sub-table set up. When people want to see data mixed aggregations of job-types for no apparent reason, they can damn well wait the 5.2 seconds it will take to assemble it. If the main or only goal is to satisfy nosy Nellies, I would either go with a auxilliary table of name-value pairs (and another auxilliary table with listing allowed or suggested attributes for each job type, or just do away with all those tables and make a "free text" clob column on the employee table (or call it XML rather than free text, if that type of things turns you on). Xho -- -------------------- http://NewsReader.Com/ --------------------
[quoted text, click to view] ctcgag@hotmail.com wrote:
My 2 cents thrown in at the bottom: [quoted text, click to view] > robertbrown1971@yahoo.com (Robert Brown) wrote: > >>>>This solution does not seem to scale since for each value of >>>>"discriminator" I need to perform a join with a different table. What >>>>if I need to retrieve 1,000 employees at once? >>> >>>Why would you ever need the attributes of more than one subtype? If >>>your query is about doctors, why would you need to join the table >>>about janitors into the query? >> >>Because I need to display users and their heterogenious attributes in >>the UI as one list (e.g. in response to a search). > > > Uh, that isn't really an answer. > > >>>If your query is about employees, why would you need to know any of the >>>subtype attributes? >> >>Same reason, our customer's HR department does not see the employees >>as separate entities. To them an employee is an employee and they want >>to see all the associated data while browsing employees. I guess you >>could show the generic attributes on the first pass and then require >>the end user to click to drill down and see the subtype attributes - >>but that's not what they want. > > > It sounds to me like the real solution is to fire your HR department and > replace them with people who just do their jobs, rather than browsing > around on the other employees information out of idle curiousity. > > If the main goal is to provide custom interfaces, one per job-type, that > support job-specific functions--and satisfying nosy Nellies is only > secondary--then I think I would go with the main-table / multiple sub-table > set up. When people want to see data mixed aggregations of job-types for > no apparent reason, they can damn well wait the 5.2 seconds it will take to > assemble it. > > If the main or only goal is to satisfy nosy Nellies, I would either go > with a auxilliary table of name-value pairs (and another auxilliary table > with listing allowed or suggested attributes for each job type, or just > do away with all those tables and make a "free text" clob column on the > employee table (or call it XML rather than free text, if that type of > things turns you on). > > > > Xho >
When I was dealing with large database systems, we tried to make a distinction between "transaction" and "report" and tried to get the response time requirements ignored for "reports". Anything which returns 1,000 rows is no longer a "transaction" in my book. As Xho said, "... they can damn well wait ..." If this were a print job, it would print on roughly 20 pages, give or take. It would take whoever about 2 minutes to even give it a cursory scan. About the same amount of time to scroll down through a screenful of 1000 entries too, probably more. Questions I used to ask when confronted with this: - What is the business need for this information? - Can you usually wait for it overnight? - In a pinch (when overnight isn't good enough), would 10 minutes be OK? - If you really need it in 1-2 seconds, how much is your hardware budget? (Because I'm going to need more/faster hardware to satisfy this request in just a second or two.) What actually used to worry me about having such "reports" as on-line "transactions" was how they would impact the response times for the other users who were doing stuff which actually did require rapid response (e.g. pull up an individual's medical history in an emergency). (I think that was actully 3 cents :) NPL -- "It is impossible to make anything foolproof because fools are so ingenious"
[quoted text, click to view] >> since by definition PRIMARY KEY cannot be null, so, why would I tell
people, "I'm an American and an American citizen", instead of just say, "I'm an American"? << It is just a personal style. I design the columns before picking my keys, so I need to know this while I am working. I like the bit of redundancy in case the schema is moved without constraints in the table declarations or if you have to change it from PRIMARY KEY to UNIQUE later. --CELKO-- =========================== Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, datatypes, etc. in your schema are. *** Sent via Developersdex http://www.developersdex.com ***
Taking out of context slightly: Tom Kyte's viewpoint on this kind of thing I think is: "sounds like a single table design to me. I do not factor out 1:1 optional relationships unless a table is "really really wide" -- really wide being into 3 digits. put the most likely to be null columns at the end of the create table and when they are NULL, they will consume NO space. benchmark it. you would always have to join to pick up this optional information (2 or 3 LIO's at least per row retrieved for each optional set of data) vs an extra 50 bytes of flags saying "this is null". I would go for the extra 50 bytes in a row that will be accessed via an index rather then incurring 2/3 LIO's to read an index to access another table. " The question was asking about subtype modeling, but the example the questioner posed wasn't really subtype. The info they were talking about was actually M:1 relationships most likely, but Tom's comments apply to the subtype situation you are discussing. Your model and application are much more simple with this single table model. Read the link for yourself to see the full thread:
Job Miller allegedly said,on my timestamp of 3/06/2004 7:08 PM: [quoted text, click to view] > put the most likely to be null columns at the end of the create table > and when > they are NULL, they will consume NO space. >
The problem is when you may have two or more of these optional 1:1 relationships. What happens when you got a bunch of columns set to NULL, another bunch set to values, then another bunch set to NULL again, and someone comes along and fills the first bunch with values? [quoted text, click to view] > benchmark it. you would always have to join to pick up this optional > information (2 or 3 LIO's at least per row retrieved for each optional > set of > data) vs an extra 50 bytes of flags saying "this is null". I would go > for the > extra 50 bytes in a row that will be accessed via an index rather then > incurring 2/3 LIO's to read an index to access another table. "
I wouldn't. If it made the app easier to document, code and expand, I'd trade the extra 2 or 3 lios easily. Heck, I'd trade 10 or more lios easily, for that! That is the sort of price I don't mind paying. Assuming of course I'm not designing/writing the next 10000000/TPS TPC benchmark. Then again, how many of those has anyone found in real life? Then again, how many of those do indeed need a 1:1? It's all relative, no? In general, designs calling for that sort of level of specification are not for performance critical apps, which makes the extra lio quite acceptable. [quoted text, click to view] > Your model and application are much more simple with this single table > model.
I don't think so. I have to add a heap of IS [NOT] NULL combinations to handle existence or not existence of subtypes. IS NULL is not an easy condition to combine with others. And it makes the code unnecessarily dense, as well as stopping me from ever being able to run this app in a db that doesn't support NULLS the way Oracle does. In all, not a good choice. I'm afraid on that one I'll have to disagree with Tom, assuming once again we are not talking about a TPC benchmark type of app. -- Cheers Nuno Souto
Don't see what you're looking for? Try a search.
|