db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Van Couvering" <da...@vancouvering.com>
Subject Re: Horizontal partitioning?
Date Wed, 18 Jul 2007 05:09:00 GMT
Hm, if you're creating a view on all the tables, that must mean all
the tables are under a single database?  If that's the case, then this
doesn't really sound like horizontal partitioning (which usually means
splitting it into multiple databases, each with its own disk and
usually each with its own CPU).

If you're keeping everything within one database, then partitioning
across multiple tables really is not useful or necessary, as Derby
itself accomplishes quick access to any row in the table using the
index on the primary key.

Normally, if you have N partitions, then you would have N databases on
N machines.  There would be no way to create a view across all of
them.

This is how folks like eBay and Google do it, for example.  This
usually only works if the data in each partition is completely or
almost completely independent and you don't need to do queries that
span tables (e.g. you can issue the query against one specific
partition depending on things like timestamp, user id, geographic
location, etc).  If your queries can't be isolated to a single
partition, then your application code is going to have to deal with
logic normally assigned to the database, such as sorting and merging
data, and now you're in the realm of a clustered database, and having
worked on two of these in the past, I think I can safely say you don't
want to go there...

David

On 7/16/07, Olivier Chedru <oChedru@infovista.com> wrote:
>
>
>
>
> Hi all,
>
>
>
> I am trying to achieve horizontal partitioning with Derby.
>
>
>
> My application stores *a lot* of data as time passes, and it needs to drop
> obsolete data.
>
>
>
> Table partitioning would be the ideal solution, but it is not implemented in
> Derby. So I create one table per period of time. All tables have the same
> columns, the first one being the data timestamp. Dropping obsolete data is
> then made easy: just drop old tables.
>
>
>
> Now, I need to query data without knowing how tables are organized. So, I
> create a view on all tables (the view is deleted/created every time a data
> table is created/dropped): fine.
>
> However, checking the runtime statistics, I notice ALL tables are scanned
> when I use the view! Note I added constraints on the Timestamp field.
>
>
>
> Sample SQL:
>
>
>
> create table t1 (ts timestamp, a int, b int, c int);
>
> create table t2 (ts timestamp, a int, b int, c int);
>
> create table t3 (ts timestamp, a int, b int, c int);
>
> alter table t1 add constraint t1_c check (ts >= '2007-07-16 09:00:00.0' and
> ts < '2007-07-16 09:01:00.0');
>
> alter table t2 add constraint t2_c check (ts >= '2007-07-16 09:01:00.0' and
> ts < '2007-07-16 09:02:00.0');
>
> alter table t3 add constraint t3_c check (ts >= '2007-07-16 09:02:00.0' and
> ts < '2007-07-16 09:03:00.0');
>
> create view t_view as select * from t1 union select * from t2 union select *
> from t3;
>
> CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
>
> CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);
>
> select * from t_view where ts >= '2007-07-16 09:01:30.0' and ts <
> '2007-07-16 09:02:00.0';
>
> VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
>
>
>
> The displayed execution plan shows t1, t2 and t3 are scanned. I expect only
> t2 is used. I also tried to force using constraints by defining the view
> this way, but it did not improve the behavior:
>
>
>
> create view t_view as select * from t1 --DERBY-PROPERTIES constraint=t1_c
> union select * from t2 --DERBY-PROPERTIES constraint=t2_c union select *
> from t3 --DERBY-PROPERTIES constraint=t3_c;
>
>
>
> Any idea on how to achieve horizontal partitioning?
>
>
>
> Thanks!
>
> Olivier.

Mime
View raw message