Hi Jukka!

It's nice to hear that jackrabbit 3 is on its way :).

At the moment we are trying to put jackrabbit-based repository into production and we are facing some scalability/concurrency problems with that. We already did some modifications to jackrabbit code to address those. I will try to share them with the community as soon as we have something workable.

The main concern for me is the locking logic and the caching logic. I think in JR 3 it should be revisited and may be even externalised through interfaces (I know you already have ISMLocking configurable, but that is probably not enough).

In any case see my answers to the survey below.

Best regards,

From: Jukka Zitting <jukka.zitting@gmail.com>
To: Jackrabbit Developers <dev@jackrabbit.apache.org>
Sent: Tue, 9 February, 2010 16:55:19
Subject: Jackrabbit 3: repository requirements


Now that Jackrabbit 2.0 is out and the major JCR 2.0 feature work is
done, it's time to start looking ahead at Jackrabbit 3. We've talked
about this a bit already at Day and I'll be posting a summary of our
ideas for further discussion, but before that I'd like to frame the
discussion by getting a better picture of the range of requirements
we'll be having for Jackrabbit 3.

So, please let us know what you expect your repositories to look like
within the next five or so years. I'm especially interested in answers
to the following questions:

* How much content (number of documents/nodes, raw amount data in
GB/TB/PB) do you have in the repository?

___100GB (to grow to 200-300GB)

* How many (concurrent) users (readers/editors/administrators) does
your repository have?

___20000 (to grow to 50000)

* Do you need Internet-scale (millions of users or exabytes of
content) features?


* Do you run the repository on a single server, on a cluster or in the cloud?


* How many and how powerful servers do you use for the repository?

___2 at the moment, plan to extend to 6 and even more

Content model:
* Do you need support for flat content hierarchies (>>10k sibling nodes)?


* Do you need support for same-name siblings?


* If you use versioning, how actively (commit on all saves / commit
only at major milestones) and for what purpose (revision history,
backup, etc.) do you use it?

___Don't use it.

* How granular (hierarchies of small properties vs. big binary blobs)
is your content?

___Mostly small properties, but also binary content, but the size of that does not exceed 1MB.

* How much of your content access is based on search / tree traversal
/ following references?

___Content is accessed through direct path access and tree traversal. Search is not used. References are not used.

* How much you rely on the repository to enforce your content model
(node type constraints, etc.)?

___We have a set of node types that enforce the structure.

* How often you modify your content model (and/or related node types)?

___Node types were defined in the beginning and now they are static.

* Do you need full ACID semantics? Is an "eventually consistent"
system good enough for you?

___No. Yes.

* Do you need more powerful search features than what we now have?

___Clustered implementation would be cool i.e. one search index for all cluster nodes like it is now with PM.

* How important is observation to your application? Do you need
trigger-like capability that can modify or reject a save() operation?

___Not important at the moment, but we thought of using that.

Feel free to answer either based on your current usage patterns or to
predict your needs for the next few years. The further ahead in the
future you can reasonably predict, the better.

Note that I intentionally restricted this set of questions to core
repository features, I'll do a poll on favorite new features later on.


Jukka Zitting