jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James McLemore <jmclem...@me.com>
Subject Re: AW: NoSql Support
Date Tue, 30 Aug 2011 20:59:35 GMT
Cosmin,

Just an FYI, I have implemented JCR on top of NoSQL using Basho's Riak.  I will have to check
my code, but the key routines are loadBundle, storeBundle, destroyBundle and the 'refs' routines.
 I started with DerbyPersistenceManager under ...pool which implements the abstract bundle
persistence manager.  The cool thing is it does all the serialization/deserialization for
you.  I have not implemented blobs yet, but Riak has luwak blobs, so I may incorporate.  I
am working through tests, and have yet to benchmark.

Since Riak has an HTTP API, it opens up all kind of cool possibilities.

On Aug 18, 2011, at 8:07 AM, Cosmin Lehene wrote:

> It might not be feasible to have a full JCR on top of NoSQL, I don't know
> yet.
> However supporting basic search is definitely possible and it should be
> fast as well. Whether that's synchronous (fully consistent) or
> asynchronous should be optional.
> I assume some of the features (e.g. transactions or indexing) should be
> available in the NoSQL store and the persistence manager should deal with
> existing interfaces and data layout.
> However there's a relatively clear solution right now for JCR on top of
> Hbase and it should have enough features so that someone looking for
> scalability could use it.
> I also think global write locks need to go away (at least for NoSQL
> persistence). This can be taken care at a more granular level inside the
> actual store.
> 
> Cosmin 
> 
> On 8/18/11 3:14 PM, "Bart van der Schans" <b.vanderschans@onehippo.com>
> wrote:
> 
>> Hi,
>> 
>> On Wed, Aug 17, 2011 at 1:51 PM, Jukka Zitting <jukka.zitting@gmail.com>
>> wrote:
>>> Hi,
>>> 
>>> On Wed, Aug 17, 2011 at 1:37 PM, Cosmin Lehene <clehene@adobe.com>
>>> wrote:
>>>> First I'll have to better understand what a bundle is :) (JCR newbie
>>>> here:)). I'll try to read about it.
>>> 
>>> A bundle is the unit of data stored by a bundle persistence manager.
>>> It contains the properties and the list of child nodes of a single JCR
>>> node.
>>> 
>>> A bundle persistence manager is expected to be able to atomically
>>> update not just a single bundle at a time, but an arbitrarily large
>>> ChangeLog of created, updated and deleted bundles. This has so far
>>> been a big problem for NoSQL-style persistence managers that only
>>> support locking at the level of individual rows.
>> 
>> I think this is one of the biggest reasons why JCR 1.0 and 2.0 do not
>> match "nicely" to most popular NoSQL stores. Imo it's not just a
>> Jackrabbit issue. The other big problem would be the search. As you
>> can scale out nicely to huge numbers with some NoSQL stores, the
>> search will not. This is partly an issue with the Lucene
>> implementation in Jackrabbit, but also the spec doesn't really "help".
>> In a big NoSQL deployment you might want to defer the searches to an
>> external clustered search engine (something solr llike), but that
>> would/could mean that the search updates lag behind the content. Aka
>> save first, index later. Another problem could be the current
>> clustering implementation which requires a global write lock (which is
>> handled through the database or shared filesystem). Especially in a
>> multi geolocation deployment a global write lock is not an option..
>> 
>> I don't think these issues can be easily "solved" by just implementing
>> a different persistence manager. It would be interesting to see if we
>> can come up with some kind of design plan of how JCR could work with a
>> NoSQL store. Maybe some of that work already started with the
>> JR3/microkernel prototyping? I could also be that you need to choose
>> one NoSQL solution and then leverage all the
>> facilities/services/functionallity provided by the store. So fully use
>> and exploit something like the Hadoop stack, the Amazon stack or even
>> the GAE stack.
>> 
>> We do see more and more people that expect everything to work smoothly
>> in the cloud and that everything scales nicely and elastically over
>> multiple datacenters. In the coming years this will become a
>> requirement and Jackrabbit should be ready for that.
>> 
>> Bart
> 


Mime
View raw message