db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Van Couvering <David.Vancouver...@Sun.COM>
Subject Re: New "segmented" StorageFactory Development
Date Fri, 05 May 2006 17:40:01 GMT
Oh, wait, how could you be working for IBM and adding a feature?  I 
thought you guys were only doing bugfixes :)

David

Rodrigo Madera wrote:
> Oh, just a technical detail... I work for IBM, but on a whole
> different project...
> 
> Is this a problem??
> 
> Thanks
> 
> On 5/5/06, Rodrigo Madera <rodrigo.madera@gmail.com> wrote:
>> On 5/5/06, Mike Matrigali <mikem_app@sbcglobal.net> wrote:
>> > Do you have any more details on your requirements, such as the 
>> following:
>> > 1) do you need for a single table and/or index to be spread across
>> >     multiple disk?
>>
>> It would be terrific and the absolute glory of the requirement,
>> however, it depends.
>>
>> Is Derby based on a table/index-is-a-single-file architecture? If so,
>> it's too much trouble to change this. Making the tables/indexes
>> segmented would only be viable (in my opinion) if Derby already
>> supports this.
>>
>> I vote to get the "divider" in place that routes the new tables/etc to
>> the different directories, and only then, when it's mature, begin a
>> table segmentation engine.
>>
>> > 2) do you want control when you create each table/index where it
>> >     goes and how?
>>
>> Yes. I'm planing on doing this automagicaly based on the specified
>> directory/capacity pairs.
>>
>> > 3) Are you looking to limit the absolute size of tables/indexes
>> >     in each directory to a fixed size?
>>
>> Absolutely. This is very important for the approach I'm thinking of in 
>> #1.
>>
>> > The existing storage system had some support for spreading data
>> > across disks built into the interfaces, but was never used.  Data
>> > is currently stored in the seg0 directory.  The idea was that
>> > support could be added to store data also in a seg1 directory
>> > located on another device.  If one were interested in this approach
>> > they would first have to fix the code to pass around correctly
>> > the seg argument (It has been observed that some code got lazy and
>> > just used 0 rather than proprogating the argument).
>>
>> I'm in. I'll co the latest version and check it out. Is it still there?
>>
>> > The next decision is how the tables are to spread across the disks.
>> > If putting whole tables or indexes fits your plan then I would use
>> > the existing table metadata catalogs to track where a file is (these
>> > may have to be upgraded to hold the new info - not sure).
>>
>> IMO: This is the way to go for now.
>>
>> >  If one
>> > wants to spread a single file across multiple segments then you need
>> > to decide if you want to do it by key or by some mathematical block
>> > number approach:
>> >
>> > partition by key
>> >     o would pave the road for future interesting parallel query
>> >       execution work.
>> >     o would recommend again top down implementation, having the
>> >       existing database metadata catalogs do the work.
>> >
>> > partition by block number
>> >     o If there is any per table/index control again use the existing
>> >       database metadata catalogs and pass the info down into
>> >       store. partitioning by block number probably would best be done
>> >       with some new module as Dan suggested with alternate storage
>> >       factory implementations.
>>
>> Too messy for now... Guess #1 is better for now...
>>
>> > If you want per table/index control I think the segX approach is the
>> > best, since the obvious input would be from the create table command.
>>
>> Ok. But I preffer to have the array of {path, capacity} tuples (or
>> table, or meta info, or ...).
>>
>> > If you rather do the bottom up approach, I would first start at looking
>> > at the in memory patch that was done.  If you don't need much per
>> > file control it may be possible to only override the StorageFactory as
>> > Dan described.
>>
>> I'll take a look at it immediately.
>>
>> > Whatever approach you pick a couple of issues come to mind:
>> > o how do you config the new segements into the db (currently just
>> > automatically done a db creation time).
>>
>> Via the configuration tuples.
>>
>> > o how do back up a multiple segment database
>>
>> Transversing the repositories.
>>
>> > o how do handle allocation of disk space to files, current model
>> >    is the db just uses all the disk space available on that disk and
>> >    fails if table allocation gets and out of disk space.
>>
>> DB uses all ${capacity} on ${path}.
>>
>>
>>
>> This is only my initial vision of the model, so please give your
>> opinions here to make it better.
>>
>> Thanks,
>> Rodrigo Madera
>>

Mime
View raw message