jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bart van der Schans <b.vandersch...@onehippo.com>
Subject Re: AW: NoSql Support
Date Thu, 18 Aug 2011 12:14:06 GMT

On Wed, Aug 17, 2011 at 1:51 PM, Jukka Zitting <jukka.zitting@gmail.com> wrote:
> Hi,
> On Wed, Aug 17, 2011 at 1:37 PM, Cosmin Lehene <clehene@adobe.com> wrote:
>> First I'll have to better understand what a bundle is :) (JCR newbie
>> here:)). I'll try to read about it.
> A bundle is the unit of data stored by a bundle persistence manager.
> It contains the properties and the list of child nodes of a single JCR
> node.
> A bundle persistence manager is expected to be able to atomically
> update not just a single bundle at a time, but an arbitrarily large
> ChangeLog of created, updated and deleted bundles. This has so far
> been a big problem for NoSQL-style persistence managers that only
> support locking at the level of individual rows.

I think this is one of the biggest reasons why JCR 1.0 and 2.0 do not
match "nicely" to most popular NoSQL stores. Imo it's not just a
Jackrabbit issue. The other big problem would be the search. As you
can scale out nicely to huge numbers with some NoSQL stores, the
search will not. This is partly an issue with the Lucene
implementation in Jackrabbit, but also the spec doesn't really "help".
In a big NoSQL deployment you might want to defer the searches to an
external clustered search engine (something solr llike), but that
would/could mean that the search updates lag behind the content. Aka
save first, index later. Another problem could be the current
clustering implementation which requires a global write lock (which is
handled through the database or shared filesystem). Especially in a
multi geolocation deployment a global write lock is not an option..

I don't think these issues can be easily "solved" by just implementing
a different persistence manager. It would be interesting to see if we
can come up with some kind of design plan of how JCR could work with a
NoSQL store. Maybe some of that work already started with the
JR3/microkernel prototyping? I could also be that you need to choose
one NoSQL solution and then leverage all the
facilities/services/functionallity provided by the store. So fully use
and exploit something like the Hadoop stack, the Amazon stack or even
the GAE stack.

We do see more and more people that expect everything to work smoothly
in the cloud and that everything scales nicely and elastically over
multiple datacenters. In the coming years this will become a
requirement and Jackrabbit should be ready for that.


View raw message