One of the things in particular that I always been a big fan about Cloudscape (Derby) is its modular architecture:
http://db.apache.org/derby/papers/derby_arch.html (modules & services)

Yet, it has not been exploited so that it could benefit other projects and allow other usage dimensions and contributions from outside Derby.

I have mentioned previously as part of this thread that I have heard many requests of people wanting to re-use key Derby key components (modules) as part of their own open source project, such as and not exclusively:

- Derby's SQL parser / compiler

- Derby's (B-Tree) Store.

The motivation for this is that a lot of Java developers know that Derby has a solid (read mature) codebase that is strongly tested as well as a rather active development community. It is a mature open source product. Stability, maturity aspects are there and we know that lots of performance improvements have been done for quite some time.

As of today, Derby in embedded mode brings a lot of value, especially in the middle-tier where it can run as a whole (as some products use it already) and perform well, Now as we know, there is a lot of innovation around data management these days and I think some of the most recent initiatives could benefit from Derby SQL processor and/or store components.
http://www.lexemetech.com/2008/06/hadoop-query-languages.html
(this is just one example)

I personally believe that even though Derby has a great modular architecture, it needs to be more flexible to benefit (and non exclusively) other Apache open source projects and at the end benefit the Derby community itself (giving back). It would help promoting and increasing adoption and contribution.

Let's say some open source projects are interested in re-using Derby's SQL processor and/or Derby's B-Tree implementation, wouldn't it be better to allow it (the flexible part) as the base for its architecture is already there (the modularity part), or would it better to see other projects ripping part of it because there is no easy way to do it and consequently not being able to contribute back to Apache Derby? This happens everyday in the open source world and very often does materialize as a fork.

By already having Derby's existing modular architecture and making it a bit more flexible would lead to have a microkernel database type of architecture and at the end it is a very good thing to have if you can claim this (modularity + flexibility)). Combine this with the feature-rich aspect of Derby, it opens up many more opportunities for Apache Derby and outside of it. Derby's great modular architecture is (unfortunately) not exposed enough for developers to benefit from it directly IMHO.

Mike's second point (single b-tree w/ a different codepath) is a clear example of what could benefit other projects, not just Derby. We're not even talking about some Derby Lite here but more like a flexible architecture onto which we can build on top and generate even more adoption, leading to more contribution. The microkernel aspect (which is not a new thing by the way) opens up to a lot more opportunities in and out. This does not affect Apache Derby "core" charter, in any way.

I just wanted to mention this as I see this as a positive aspect for Apache Derby, its users and developers community and would bring interesting opportunities outside.

Of course, this is some itch to scratch...and my view on this...

--francois

On Tue, Jul 1, 2008 at 12:52 PM, Rick Hillegas <Richard.Hillegas@sun.com> wrote:
Hi Mike,


Thanks for responding. Some comments inline...

Mike Matrigali wrote:
Bryan Pendleton wrote:
be useful for applications which just need to put and get data by key value. These would be applications which don't need complex queries or SQL.

Aren't there some pretty good packages for this already? E.g., BDB-JE,
JDBM, Perst, etc.?

Speaking totally personally, I'd sure like to see Derby focus on the
things that make it special:
 - complete and correct JDBC implementation
 - complete and correct SQL implementation
 - low footprint, zero-admin reliable multi-user DBMS

thanks,

bryan

I agree with bryan, I would rather see the Derby project concentrate on
the stated goals of the project as Bryan enumerates.

I do wonder if within this scope derby could do a better job of addressing the application paradigm of only needing single keyed
access to a row of the form (short key, short data).  By being embedded
I think that there are other usage patterns which are important to people who use these btree stores. It's not just single row key/value lookup although that's an important case. People also like to position an iterator on a btree and then march forward (or backward), updating and deleting as they go.

derby already presents a better solution for a java application than
a lot of databases.  So issues are:
1) can we improve the jdbc implementation to make using it for a compiled plan close to as efficient as a non standard, store specific interface?  And if jdbc is too complicated, could something very simple
be provided on top of jdbc at the cost of an extra method call per
access?
I think these are useful itches to scratch. Just to repeat the benefits of Derby Lite:

i) smaller
ii) faster
iii) easier

Streamlining the JDBC and SQL interpreter stacks would give us something faster but not smaller or easier.

There are already plenty of ORM layers which you can bolt on top of Derby in order to get something that is easier but bigger. I think that the ORM layers can deliver something faster too but at the cost of warming up a cache.


2) Can we provide a way such that only a single btree need be created, rather than the current requirement of a heap and index.  The current
model works well if one needs to create multiple indexes on the base
data, and if there is is no limit on the size of the un-indexed portion
of the data.
Other relational databases support this structure.  If we were just building this for Derby Lite, then some of the issues are called out at http://wiki.apache.org/db-derby/DerbyLite#head-6d68dc7584419d1853f1f15d203513afa44be3a6 :

a) The current payload in the btree leaf page is a RowLocation and there may be some other assumptions about this payload, for instance as you note, its size.

b) Row-level locks are held on heap rows, not btree entries.

I could definitely see this improvement being built by the Derby Lite enthusiasts. Then someone else could come in and add support for it in the Derby SQL interpreter.

Thanks,
-Rick

3) anything else?