jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: my first hops
Date Mon, 08 Dec 2008 23:04:08 GMT

On Mon, Dec 8, 2008 at 11:24 PM, Torsten Curdt <tcurdt@apache.org> wrote:
>>> Maybe even a webdav servlet that transparently versions changes?
>> It doesn't do versioning transparently, but it does support the WebDAV
>> versioning features.
> Hm ...so how would that work if you use the standard OSX/Windows
> client and you just mount the repository.
> Would it version the files or not?

No, you'll need a versioning-aware client to explicitly invoke the

> I remember there is such autoversioning option to
> mod_dav (SVNAutoversioning)

We don't support that out of the box, but if you need that
functionality it should be reasonably straightforward to implement it
by subclassing the WebDAV servlet and adding the extra versioning
calls around normal write operations.

>> There are also a few good open source
>> browsers around, I've personally used and liked the JCR Explorer
>> available at http://www.jcr-explorer.org/.
> That one looks indeed quite good. It's ASL 2.0 - why not include that
> if there are problems with the CRX one?
> IMO it would be a big step forward to have something like that out of the box.

Yeah, I guess we should do that.

> (Still congrats on the standalone jar ... that is pretty sweet!)

Thanks. :-)

>> As you noticed, the recommended approach for now would be to use a
>> Jackrabbit cluster with each cluster node running locally on each
>> front end server (and in the same JVM process as your application).
> OK ... what about the persistence part? I know CRX has the mighty Tar
> PM :) ...but what about scaling at this end? Has this ever been a
> problem? If you have a cluster of 5-10 machines and just a single
> database for persistence I would imagine this could potentially become
> a bottleneck. Anyone ever used a whole database cluster for
> persistence?

You'll typically want a clustered database as the backend storage for
best fault-tolerance and scalability. We've used such setups quite
often and it works great.

> Any suggestions there? I might "have to" use an Oracle.

Most of the customer projects I've done already have a "company
standard" database backend, so typically you use the database that's
already there. The way Jackrabbit stores content below the persistence
manager layer is quite simple (we don't even need JOINs!), so any
modern database will probably do just fine. Take whatever you are most
comfortable with.

Note that for repositories with lots of large binaries I would suggest
using the data store feature based on a shared disk (NAS or SAN), as
that will decouple all costly binary accesses from the database.

> My first though was: shouldn't the JCR server just have a REST API?
> ...and then thought of Sling. And CouchDB. Or probably much more
> FeatherDB (http://fourspaces.com/blog/2008/4/11/FeatherDB_Java_JSON_Document_database)
> How this fits the picture might probably more something for the dev list.

Yeah, it's still an area of development. If you're interested, you may
want to check out the spi2dav effort in the Jackrabbit sandbox where
we're building a remoting mechanism for the full JCR API based on the
WebDAV protocol. See
http://jackrabbit.apache.org/JCR_Webdav_Protocol.doc for an earlier
draft of the protocol details.

> I was actually surprised about the choice of RMI anyway.
> (Forgive my words - but it's a bitch of a protocol)

The rationale for going with RMI originally was to get something
reasonably complete done quickly and easily. That approach actually
worked much better than I had originally hoped and we were able to
cover almost the entire JCR API (that's not too small) with relatively
little effort. In that sense I'm pretty happy with our use of RMI, but
of course that simplicity comes with limitations.

>>> Does the index get synchronized through the jackrabbit cluster
>>> mechanism?
>> Yes. The cluster nodes listen for changes recorded in the cluster
>> journal, and update the indexes based on the observed updates.
> Incrementally? Are there any guarantees for the observation? I just
> imagine a node to go down, miss an update and be out of sync when it
> comes back up. Something you really don't want to have in a cluster.

The journal keeps the update records until all cluster nodes have seen
them (see JCR-1087), so you'll never miss updates.

>> The version histories of all versionable nodes are available in the
>> /jcr:system/jcr:versionStorage subtree. You can search for all past
>> versions in that subtree, or for the checked out versions in normal
>> workspace storage outside /jcr:system.
> So the index includes and references all versions?


> "bundle persistence features"? WDYM?

See JCR-755, introduced in Jackrabbit 1.3. "Bundle persistence" is
currently the recommended and default persistence mechanism in
Jackrabbit. It essentially stores each node as a "bundle" that
contains all the properties and child node references associated with
that node. Previously we used separate records for all nodes *and*
properties, but that turned out to cause way too many calls to the
backend database or file system. The bundle approach seems to be right
level of granularity for JCR (though you may want to look up the NGP
discussions on dev@ about potential alternatives) and it's worked
pretty well so far.


Jukka Zitting

View raw message