db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rodrigo Madera" <rodrigo.mad...@gmail.com>
Subject Re: New "segmented" StorageFactory Development
Date Fri, 05 May 2006 17:33:41 GMT
On 5/5/06, Mike Matrigali <mikem_app@sbcglobal.net> wrote:
> Do you have any more details on your requirements, such as the following:
> 1) do you need for a single table and/or index to be spread across
>     multiple disk?

It would be terrific and the absolute glory of the requirement,
however, it depends.

Is Derby based on a table/index-is-a-single-file architecture? If so,
it's too much trouble to change this. Making the tables/indexes
segmented would only be viable (in my opinion) if Derby already
supports this.

I vote to get the "divider" in place that routes the new tables/etc to
the different directories, and only then, when it's mature, begin a
table segmentation engine.

> 2) do you want control when you create each table/index where it
>     goes and how?

Yes. I'm planing on doing this automagicaly based on the specified
directory/capacity pairs.

> 3) Are you looking to limit the absolute size of tables/indexes
>     in each directory to a fixed size?

Absolutely. This is very important for the approach I'm thinking of in #1.

> The existing storage system had some support for spreading data
> across disks built into the interfaces, but was never used.  Data
> is currently stored in the seg0 directory.  The idea was that
> support could be added to store data also in a seg1 directory
> located on another device.  If one were interested in this approach
> they would first have to fix the code to pass around correctly
> the seg argument (It has been observed that some code got lazy and
> just used 0 rather than proprogating the argument).

I'm in. I'll co the latest version and check it out. Is it still there?

> The next decision is how the tables are to spread across the disks.
> If putting whole tables or indexes fits your plan then I would use
> the existing table metadata catalogs to track where a file is (these
> may have to be upgraded to hold the new info - not sure).

IMO: This is the way to go for now.

>  If one
> wants to spread a single file across multiple segments then you need
> to decide if you want to do it by key or by some mathematical block
> number approach:
>
> partition by key
>     o would pave the road for future interesting parallel query
>       execution work.
>     o would recommend again top down implementation, having the
>       existing database metadata catalogs do the work.
>
> partition by block number
>     o If there is any per table/index control again use the existing
>       database metadata catalogs and pass the info down into
>       store. partitioning by block number probably would best be done
>       with some new module as Dan suggested with alternate storage
>       factory implementations.

Too messy for now... Guess #1 is better for now...

> If you want per table/index control I think the segX approach is the
> best, since the obvious input would be from the create table command.

Ok. But I preffer to have the array of {path, capacity} tuples (or
table, or meta info, or ...).

> If you rather do the bottom up approach, I would first start at looking
> at the in memory patch that was done.  If you don't need much per
> file control it may be possible to only override the StorageFactory as
> Dan described.

I'll take a look at it immediately.

> Whatever approach you pick a couple of issues come to mind:
> o how do you config the new segements into the db (currently just
> automatically done a db creation time).

Via the configuration tuples.

> o how do back up a multiple segment database

Transversing the repositories.

> o how do handle allocation of disk space to files, current model
>    is the db just uses all the disk space available on that disk and
>    fails if table allocation gets and out of disk space.

DB uses all ${capacity} on ${path}.



This is only my initial vision of the model, so please give your
opinions here to make it better.

Thanks,
Rodrigo Madera

Mime
View raw message