Groups | Blog | Home
all groups > sql server (alternate) > november 2004 >

sql server (alternate) : Horizontal Partitioning question


jcelko212 NO[at]SPAM earthlink.net
11/14/2004 8:28:40 AM
[quoted text, click to view]
partitioned into 4 tables. I'm not sure if this was a poor design
choice, or if it was
done for valid performance reasons. <<

Without knowing any more than that, the smart would bet on poor design
...

[quoted text, click to view]
they are named differenly and the columns are named differenlty to
differentiate the data from a business usage perspective. <<

Here we MAY have a valid design reason. Is the data logically
different in each case? Not just a status change (paid versus unpaid
bills, etc.), really different? If not, then this is a mess.

[quoted text, click to view]
to the clustered index that would be used to differentiate the
business usage. <<

Bingo! No logical differences, no separate tables in the data model.

[quoted text, click to view]
performance or if it would be better to leave them the way they are.
<<

Performance is a secondary issue. Correctness and removing redudant
data element name is the first issue. Make it right, then make it
fast.

[quoted text, click to view]
[sic] from more than one of the tables, which is good. However, there
are a number of processes that query against all of the tables on the
identical clustered index range. I am not sure exactly how many rows
are in the tables but I'm fairly certain the entire database is < 50
GB. <<

Write some VIEWs on the data. Performance with a clustered index
MissLivvy
11/14/2004 10:57:39 AM
I recently came across a database where the data are horizonally partitioned
into 4 tables. I'm not sure if this was a poor design choice, or if it was
done for valid performance reasons. The schema of the tables are essentially
the same, it's just that they are named differenly and the columns are named
differenlty to differentiate the data from a business usage perspective. The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I am
trying to evaluate whether combining the tables would improve performance or
if it would be better to leave them the way they are. Many queries that run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query
against all of the tables on the identical clustered index range. I am not
sure exactly how many rows are in the tables but I'm fairly certain the
entire database is < 50 GB.

John Bell
11/14/2004 2:57:26 PM
Hi

You don't say if they have been set up as a partitioned view, but your
comment about business usage would tend to imply they haven't? If they
haven't then this would be the change I would look at first, especially if
the growth rate of the system would indicate federation will be necessary

If only a small percentage of queries access all the tables, then this may
also indicate there is a performance benefit. If the tables are on different
filegroups and are on different disc subsystems then performance may have
been a valid reason to split them up.

Without being there when the decission to partition them was made, you will
not know the underlying stats or reasons for this design, and I would bet
they have not been documented!

If you are going to combine them, then create a benchmark test so that you
can compare each configuration, and test the two alternatives in a
controlled environment. If you can't do that, then unless there is a
specific reason to change what is already working (and perfoming well!) then
I wouldn't.

John

[quoted text, click to view]

MissLivvy
11/14/2004 7:30:35 PM

[quoted text, click to view]

Correct. There is no partitioned view. I don't think the current design
lends itself to that since there is currenlty no column that could be used
for the check constraint. There exist data spread across all tables with the
same primary key. Data with the same PK are logically related from a
business perspective. To create a check constraint, I think we'd have to add
another column like the one I mention below.

[quoted text, click to view]
then

Peformance is definately a problem though with operations that need to query
against all of the tables at the same time. For example, one thing that
users routinely need to do is copy a large range of rows from all of the
tables and insert them back into the same tables (with a new PK, of course).
I will try to find out if different filegroups were used for the different
tables, but I'm guessing this is not the case.

In my case, since sometimes we need to acess all of the tables at once, and
sometimes not, what I need to do is measure the tradeoff between improved
performance in situations where only 1 of the tables need accessed, vs the
penaly paid when all tables need to be accessed. My gut feeling is that
increase in time spent traversing the B-tree in the combined table should be
less significant than the penalty paid for having the data split up when we
need to access all tables at the same time. But again, I really need to
measure this.

Thanks.

[quoted text, click to view]

Vincent Lascaux
11/14/2004 10:31:03 PM
[quoted text, click to view]

Hum, may I expose one problem I had. I have been in charge of redesigning a
database. This database contained a table called Directories that contained
the absolute path of some folders frequently used in other tables. There was
a need to differentiate three kind of folders : input, output and binary
folders. The goal was to use nick names of the folders in other tables. So I
had this schema :

Directories
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)
primary key(nick_name, type)

Jobs
input_folder
output_folder
binary_folder


I have been told that this was not a good design because I was not able to
link the Jobs table to the Directories one (the join would require a
constant. For example, input_folder is the nick_name, the type is 0).
The way to solve the problem was to create 3 different tables
InputDirectories, OutputDirectories and BinaryDirectories and to link the
Jobs table to those 3 directories.

What is best design ?

--
Vincent

Erland Sommarskog
11/14/2004 11:57:53 PM
MissLivvy (XeveryidiwantistakenX@yahoo.com) writes:
[quoted text, click to view]

One option would be to retain the tables, and then build an indexed view
that combines them. Of course, this will double the disk space, and also
come with a cost for updates. But if the main activity is querying, this
could be the best of both words.

Note: to be able to fully use indexed views, you need Enterprise Edition.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
MissLivvy
11/15/2004 6:30:16 AM
Thanks Erland.
Yes there is a lot of inserting and updating going on with these tables, so
I think we'd be paying too high a price for the querying benefit of the
indexed view.


[quoted text, click to view]

MissLivvy
11/15/2004 6:51:44 AM
What about:

Directories
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)
primary key(nick_name, type)

Job
(JobID int primary key,
JobName varchar(20)
)

Job_Directory
(JobID int,
nickname varchar(20),
type (byte)
)
with PK on JobID + nickname + type

[quoted text, click to view]

Dan Gidman
11/15/2004 6:55:58 AM
[quoted text, click to view]
users routinely need to do is copy a large range of rows from all of
the
tables and insert them back into the same tables (with a new PK, of
course).

This seems to me like a lot of redundant data will get created
needlessly. It is probably why the db is +50 gig in size. Also a good
indication of poor design. Is this data historic or frequently
updated? if it is historic and is not changed (like a pos sales
record) Why copy the data around so much?
Vincent Lascaux
11/15/2004 7:20:19 PM
[quoted text, click to view]

Considering that any job has one and exactly one path of each type, you have
a 1-3 relationship. I dont know if it is better than 1-1, that I heard is
bad :)
And it makes the SQL queries more complex to write (for no added value)

--
Vincent

MissLivvy
11/16/2004 7:04:45 AM
It's a financial forecasting application and the data are heavily
manipulated by the users after copying from another version of the forecast.
Copying is just an easier way for them to get started vs. starting over
completely from scratch. They also run variance reports to compare different
versions of the forecast. To reduce the size of the database, I think an
archiving strategy would be appropriate.

[quoted text, click to view]

MissLivvy
11/16/2004 7:21:43 AM
Maybe I misunderstood the problem. The way I understood it:

1] a directory can be one of 3 types: input, output, or binary.
2] A job has up to 3 directories: input, output and binary.
3] A directory can be shared by more than one job.

Is that correct?

[quoted text, click to view]

Vincent Lascaux
11/16/2004 6:44:41 PM
[quoted text, click to view]

True
And the same nickname can be used for different types

[quoted text, click to view]

Half true : a job has exactly 3 directories : one input, one output and one
binary directory

[quoted text, click to view]

True

--
Vincent

MissLivvy
11/17/2004 4:55:04 AM
Then like my design better. You may find it a pain to have to join to an
extra table, but with your design you need 3 joins to get each of the 3
directories related to a job. Also, if you ever have a 4th directory related
to a job you have to add a new column.

If you have no other attributes to add to the Job table, then you could get
rid of the job table and just do:

Directory (
nick_name varchar(20)
type byte //0: input, 1: output, 2:binary
path varchar(1000)

)
(with primary key(nick_name, type))

Job_Directory
(JobName nvarchar(20),
nickname varchar(20),
type (byte)
)

(with primary key(JobName, nick_name, type))
and fk to Directory on nickname, type)



[quoted text, click to view]

AddThis Social Bookmark Button