Groups | Blog | Home
all groups > sql server data warehouse > september 2003 >

sql server data warehouse : Data archive design question


Vagif Abilov
9/10/2003 5:48:46 PM
We've come across the task of how store old data that are no longer in use.
This can be temporary session codes, old user profile data etc. System is
required to archive everything. We consider two mail alternatives:

1. Store both active and old information in the same database table, and
have a dedicated bit field that is set in case data are old. But flag is
updated when data is obsolete.
Advantage: everything is kept in a single place. No need to move/delete
data.
Disadvantage: over time most of the data represent old (unused) information.
Although data is indexed, the lookup time will increase.

2. Have a dedicated table for old (obsolete) information. Move old data
there and delete them from a table with active information.
Advantage: Clear separation between active and old data.
Disadvantage: two table, data must be moved between tables every time
information becomes obsolete.

Currently we have 1) implemented, but I am more and more convinced that we
should go for 2).

In case you have experience with data warehouse or large data processing,
what would you choose?

Thanks in advance

--
Vagif Abilov
vagif@NOSPAMonline.no
MCP, Visual C++
Oslo, Norway

Darren Green
9/10/2003 10:01:18 PM
In article <u8EfGM7dDHA.2332@TK2MSFTNGP11.phx.gbl>, Vagif Abilov
<vagif@online.no> writes
[quoted text, click to view]

I go for 2. I have a run-time (OLTP) system and a separate data
warehouse (DW) system. Just a few reasons I can think of now -

- Smaller tables (and DB) on run-time means better performance as you
say.

- Run-time storage costs are higher, highly redundant SAN Vs direct
attached array for DW system.

- Smaller tables (DB) on run-time means faster recovery in a DR
scenario. Restoring 18 GB vs 90 GB is a long time, and when the disaster
strikes you want as fast a restore as possible, so you want to only
restore the necessary data. DW data will get done later, once the main
systems are up. What is you priority?

- Historic data is kept for reporting purposes, but these are often
inefficient queries that may block run-time queries. Mixing long running
reporting style queries with small targeted OLTP type queries usually
leads to blocking of the latter, and consequently bad performance for
the end-user. This is really the biggest issue I think.

- Security, my DW server is much more secure compared to the run-time if
only because of the physical security, so my data is safer. Whilst the
run-time system is not insecure it is more exposed due to the other
systems that access it. The attack surface is larger basically.

Unless we are talking very small data volumes I would always go for
option two.

--
Darren Green (SQL Server MVP)
DTS - http://www.sqldts.com

PASS - the definitive, global community for SQL Server professionals
http://www.sqlpass.org
Vagif Abilov
9/11/2003 7:22:25 AM
Thanks a log, Darren. That was a valuable answer.

--
Vagif Abilov
vagif@NOSPAMonline.no
MCP, Visual C++
Oslo, Norway


"Darren Green" <darren.green@reply-to-newsgroup-only.uk.com> wrote in
message news:bw9S19OeE5X$EwtT@sqldts.com...
[quoted text, click to view]

AddThis Social Bookmark Button