jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Jackrabbit 3: repository requirements
Date Wed, 17 Feb 2010 11:30:22 GMT

On 9 Feb 2010, at 15:55, Jukka Zitting wrote:

> Hi,
> Now that Jackrabbit 2.0 is out and the major JCR 2.0 feature work is
> done, it's time to start looking ahead at Jackrabbit 3. We've talked
> about this a bit already at Day and I'll be posting a summary of our
> ideas for further discussion, but before that I'd like to frame the
> discussion by getting a better picture of the range of requirements
> we'll be having for Jackrabbit 3.
> So, please let us know what you expect your repositories to look like
> within the next five or so years. I'm especially interested in answers
> to the following questions:
> Scalability:
> * How much content (number of documents/nodes, raw amount data in
> GB/TB/PB) do you have in the repository?

At the moment upto 10s of TB,
In the future perhaps PB range.

> * How many (concurrent) users (readers/editors/administrators) does
> your repository have?

Depends on definition of concurrency.
Number of users currently expected to be <4M in one installation.
In any one hour typically 100K active.
All potential writers to the underlying JCR, but mostly reading (80-90% of requests)

> * Do you need Internet-scale (millions of users or exabytes of
> content) features?


> Deployment:
> * Do you run the repository on a single server, on a cluster or in the cloud?

cluster, but would prefer cloud like, need a better PM and ClusterNode than in current JR16/JR2
(need to check JR2 in more detail)

> * How many and how powerful servers do you use for the repository?

depends on each individual deployment.

> Content model:
> * Do you need support for flat content hierarchies (>>10k sibling nodes)?

trying to avoid that, but under a lot of pressure to support. 

> * Do you need support for same-name siblings?


> * If you use versioning, how actively (commit on all saves / commit
> only at major milestones) and for what purpose (revision history,
> backup, etc.) do you use it?

yes, but only on demand.

> * How granular (hierarchies of small properties vs. big binary blobs)
> is your content?

user generated content is all properties,
uploads all blobs, typically > 64K

> * How much of your content access is based on search / tree traversal
> / following references?

search 50%
tree 45%
references < 5% (avoiding strong refs, ie uuid in string, or the path)

> * How much you rely on the repository to enforce your content model
> (node type constraints, etc.)?

not at all.

> * How often you modify your content model (and/or related node types)?

occasionally, 90% unstructured.

> Features:
> * Do you need full ACID semantics?

no, very rarely and if we do we put specific protocols in place.

> Is an "eventually consistent"
> system good enough for you?


> * Do you need more powerful search features than what we now have?


> * How important is observation to your application? Do you need
> trigger-like capability that can modify or reject a save() operation?

Not important for in JCR operations, but need async notification of changes.

> Feel free to answer either based on your current usage patterns or to
> predict your needs for the next few years. The further ahead in the
> future you can reasonably predict, the better.
> Note that I intentionally restricted this set of questions to core
> repository features, I'll do a poll on favorite new features later on.
> BR,
> Jukka Zitting

View raw message