db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francois Orsini" <francois.ors...@gmail.com>
Subject Re: simpler api to the Derby store
Date Thu, 03 Jul 2008 06:39:11 GMT
One of the things in particular that I always been a big fan about
Cloudscape (Derby) is its modular architecture:
http://db.apache.org/derby/papers/derby_arch.html (modules & services)

Yet, it has not been exploited so that it could benefit other projects and
allow other usage dimensions and contributions from outside Derby.

I have mentioned previously as part of this thread that I have heard many
requests of people wanting to re-use key Derby key components (modules) as
part of their own open source project, such as and not exclusively:

- Derby's SQL parser / compiler

- Derby's (B-Tree) Store.

The motivation for this is that a lot of Java developers know that Derby has
a solid (read mature) codebase that is strongly tested as well as a rather
active development community. It is a mature open source product. Stability,
maturity aspects are there and we know that lots of performance improvements
have been done for quite some time.

As of today, Derby in embedded mode brings a lot of value, especially in the
middle-tier where it can run as a whole (as some products use it already)
and perform well, Now as we know, there is a lot of innovation around data
management these days and I think some of the most recent initiatives could
benefit from Derby SQL processor and/or store components.
http://www.lexemetech.com/2008/06/hadoop-query-languages.html
(this is just one example)

I personally believe that even though Derby has a great modular
architecture, it needs to be more flexible to benefit (and non exclusively)
other Apache open source projects and at the end benefit the Derby community
itself (giving back). It would help promoting and increasing adoption and
contribution.

Let's say some open source projects are interested in re-using Derby's SQL
processor and/or Derby's B-Tree implementation, wouldn't it be better to
allow it (the flexible part) as the base for its architecture is already
there (the modularity part), or would it better to see other projects
ripping part of it because there is no easy way to do it and consequently
not being able to contribute back to Apache Derby? This happens everyday in
the open source world and very often does materialize as a fork.

By already having Derby's existing modular architecture and making it a bit
more flexible would lead to have a microkernel database type of architecture
and at the end it is a very good thing to have if you can claim this
(modularity + flexibility)). Combine this with the feature-rich aspect of
Derby, it opens up many more opportunities for Apache Derby and outside of
it. Derby's great modular architecture is (unfortunately) not exposed enough
for developers to benefit from it directly IMHO.

Mike's second point (single b-tree w/ a different codepath) is a clear
example of what could benefit other projects, not just Derby. We're not even
talking about some Derby Lite here but more like a flexible architecture
onto which we can build on top and generate even more adoption, leading to
more contribution. The microkernel aspect (which is not a new thing by the
way) opens up to a lot more opportunities in and out. This does not affect
Apache Derby "core" charter, in any way.

I just wanted to mention this as I see this as a positive aspect for Apache
Derby, its users and developers community and would bring interesting
opportunities outside.

Of course, this is some itch to scratch...and my view on this...

--francois

On Tue, Jul 1, 2008 at 12:52 PM, Rick Hillegas <Richard.Hillegas@sun.com>
wrote:

> Hi Mike,
>
> Thanks for responding. Some comments inline...
>
> Mike Matrigali wrote:
>
>> Bryan Pendleton wrote:
>>
>>> be useful for applications which just need to put and get data by key
>>>> value. These would be applications which don't need complex queries or SQL.
>>>>
>>>
>>> Aren't there some pretty good packages for this already? E.g., BDB-JE,
>>> JDBM, Perst, etc.?
>>>
>>> Speaking totally personally, I'd sure like to see Derby focus on the
>>> things that make it special:
>>>  - complete and correct JDBC implementation
>>>  - complete and correct SQL implementation
>>>  - low footprint, zero-admin reliable multi-user DBMS
>>>
>>> thanks,
>>>
>>> bryan
>>>
>>>  I agree with bryan, I would rather see the Derby project concentrate on
>> the stated goals of the project as Bryan enumerates.
>>
>> I do wonder if within this scope derby could do a better job of addressing
>> the application paradigm of only needing single keyed
>> access to a row of the form (short key, short data).  By being embedded
>>
> I think that there are other usage patterns which are important to people
> who use these btree stores. It's not just single row key/value lookup
> although that's an important case. People also like to position an iterator
> on a btree and then march forward (or backward), updating and deleting as
> they go.
>
>> derby already presents a better solution for a java application than
>> a lot of databases.  So issues are:
>> 1) can we improve the jdbc implementation to make using it for a compiled
>> plan close to as efficient as a non standard, store specific interface?  And
>> if jdbc is too complicated, could something very simple
>> be provided on top of jdbc at the cost of an extra method call per
>> access?
>>
> I think these are useful itches to scratch. Just to repeat the benefits of
> Derby Lite:
>
> i) smaller
> ii) faster
> iii) easier
>
> Streamlining the JDBC and SQL interpreter stacks would give us something
> faster but not smaller or easier.
>
> There are already plenty of ORM layers which you can bolt on top of Derby
> in order to get something that is easier but bigger. I think that the ORM
> layers can deliver something faster too but at the cost of warming up a
> cache.
>
>  2) Can we provide a way such that only a single btree need be created,
>> rather than the current requirement of a heap and index.  The current
>> model works well if one needs to create multiple indexes on the base
>> data, and if there is is no limit on the size of the un-indexed portion
>> of the data.
>>
> Other relational databases support this structure.  If we were just
> building this for Derby Lite, then some of the issues are called out at
> http://wiki.apache.org/db-derby/DerbyLite#head-6d68dc7584419d1853f1f15d203513afa44be3a6:
>
> a) The current payload in the btree leaf page is a RowLocation and there
> may be some other assumptions about this payload, for instance as you note,
> its size.
>
> b) Row-level locks are held on heap rows, not btree entries.
>
> I could definitely see this improvement being built by the Derby Lite
> enthusiasts. Then someone else could come in and add support for it in the
> Derby SQL interpreter.
>
> Thanks,
> -Rick
>
>>
>> 3) anything else?
>>
>>
>

Mime
View raw message