db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Pickett <David.Pick...@PHLX.com>
Subject Features -- table partitions, IOT, view-like SQL functions, db links, replication.
Date Mon, 18 Jul 2005 18:22:23 GMT
I am new here, and not so much interested in being a developer as an analyst, 
if there is room!  I saw the discussion in GMANE of table partitions, but it 
seemed not to go far enough, and the features discussed in the 10.1 Alpha docs 
seem to indicate that they did not make it.

Obviously, table partitioning does not look like a cheap feature to add, but 
there are two unique virtues to table partitions (which have nothing to do with 
striping):

 1. Keying on low cardinality but popular attributes - If, for instance, you 
have a few major sets of data in a table, indexing a column for two bits is not 
effective.  A partition means you can just scan the partition you want.  In the 
implementation, you might even make the column storage shrink by placing the 
partition key at the partition level.  After all, a partitioned table is a bit 
like a view of a union with smart modification triggers.  In fact, maybe a 
cheap table partitioning might be just that under the covers, with smart union 
optimization.

 2. Data life cycle: For example, you can put each month in a partition of it's 
own, and efficiently drop a month after N months to manage space.

BTW, I am a fan of what Sybase started out with, what Oracle calls Index 
Organized Tables.  Not a cheap feature, but there are so many relations in a 
well normalized database with one major key in use, their performance is hard 
to ignore.  It has some of the same optimization virtues as partitioning: 
segregating the data physically.

Databases have historically had file system extensions embedded in them, so 
they could stripe, mirror, juggle space, and expand onto new devices, as the 
OS's were incompetent for big files and inflexible.  I agree that it is about 
time the functionality should be external, in a file system layer if not in a 
freestanding peripheral Storage Management System or device.  I mean to go 
Google for file systems, and see if open source ones have improved.


I was disappointed that Derby did not seem to have any way to make more than 
one disk appear locally.  Maybe at the table and schema level, a different 
directory could be applied by option, so some control and size expansion is 
available for the casual user.  It would be a relatively cheap feature.


Another way to get the same effect would be to add 'database links' to Derby.  
The additional space could be in another Derby instance linked in.  A view of a 
union could marry the local and remote table.  Of course, this might be a bit 
slow for some queries.  Links would also be a modestly cheap feature to add, 
and they have many uses.


Not being so much a JAVA person as an SQL person, I was disappointed that SQL 
functions are missing, except as triggers.  I wonder how many people have dummy 
tables with triggers as a kludge, just to get this.  Such functions or 
procedures should be able to act as tables, what Oracle calls a pipelining 
function.  This allows many useful functions, e.g., in place of a view of a 
union above, you could put in a smart function to pick the right table(s).  
Again, as this is already written for triggers, it should be a cheap feature to 
add.


Replication: I remain in awe of the power and simplicity of the 'log shipping' 
replication I first saw in Sybase, especially after seeing the Oracle's trigger 
based snapshot table instantiated view thing in action.  The concept is 
relatively simple and there is almost no additional load on the upstream 
server, just more reading of log files.  This could be added as a smart client 
daemon accessory, adding little or no server complications.  The client would 
know the log and config structure, and could turn log after explicit and 
implicit filtering into JDBC on a requested remote instance.  The most complex 
form, where you can update either, is still just smart filtering so the update 
does not keep bouncing.  In practice, the latency of the 'log shipping' 
replication can be very small, smaller than the 'trigger based' kind!  The 
replication concept allows many interesting architectures: trees, rings, and 
cliques.

Well, there you go -- another person's wish list!


Mime
View raw message