db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: New "segmented" StorageFactory Development
Date Fri, 05 May 2006 17:11:43 GMT
Do you have any more details on your requirements, such as the following:
1) do you need for a single table and/or index to be spread across
    multiple disk?
2) do you want control when you create each table/index where it
    goes and how?
3) Are you looking to limit the absolute size of tables/indexes
    in each directory to a fixed size?

The existing storage system had some support for spreading data
across disks built into the interfaces, but was never used.  Data
is currently stored in the seg0 directory.  The idea was that
support could be added to store data also in a seg1 directory
located on another device.  If one were interested in this approach
they would first have to fix the code to pass around correctly
the seg argument (It has been observed that some code got lazy and
just used 0 rather than proprogating the argument).

The next decision is how the tables are to spread across the disks.
If putting whole tables or indexes fits your plan then I would use
the existing table metadata catalogs to track where a file is (these
may have to be upgraded to hold the new info - not sure).  If one
wants to spread a single file across multiple segments then you need
to decide if you want to do it by key or by some mathematical block
number approach:

partition by key
    o would pave the road for future interesting parallel query
      execution work.
    o would recommend again top down implementation, having the
      existing database metadata catalogs do the work.

partition by block number
    o If there is any per table/index control again use the existing
      database metadata catalogs and pass the info down into
      store. partitioning by block number probably would best be done
      with some new module as Dan suggested with alternate storage
      factory implementations.

If you want per table/index control I think the segX approach is the
best, since the obvious input would be from the create table command.

If you rather do the bottom up approach, I would first start at looking
at the in memory patch that was done.  If you don't need much per
file control it may be possible to only override the StorageFactory as
Dan described.

Whatever approach you pick a couple of issues come to mind:
o how do you config the new segements into the db (currently just 
automatically done a db creation time).
o how do back up a multiple segment database
o how do handle allocation of disk space to files, current model
   is the db just uses all the disk space available on that disk and
   fails if table allocation gets and out of disk space.

Rodrigo Madera wrote:
> Hello to all,
> I'm Rodrigo Madera, software developer. I need a certain feature in
> the Derby database system, and I am volunteering to develop such a
> feature.
> My requirement is that a database can be spread into different
> directories (please read the Derby Users mailing list thread:
> "Spawning Data on Multiple Directories").
> I am filing a JIRA feature request and I count on the community for
> pointers and information on developing such a feature.
> I am still seeking the knowledge and planning the necessary steps to
> do such a thing. I don't have *lots* of time, but since this is a
> requirement for me, I can spend some hours on it.
> Thanks to all for any input/critics/suggestions,
> Rodrigo Madera

View raw message