jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Nuescheler" <david.nuesche...@gmail.com>
Subject Re: workspace / repository scalability
Date Wed, 23 May 2007 14:55:08 GMT
hi cris,

> I'm expecting that we will have about 10-15 nodes per "document" in most
> cases, though some could have 35-50.
sounds good.

> When you say adequate hierarchical structure, does this imply that we should
> try to keep our tree "bushy"? Really, because we rely on the external search
> engine for location, we only direct query on sequential ID at the database.
> Should a partitioning strategy be used? If so, what sort of depth might we
> aim for?
i see... i think it is important to mention that jackrabbit is not optimized for
long lists of child nodes currently, so i would recommend to stay away if
possible from more than a couple of hundred child nodes.
as a guidance for hierarchy i usually use something like:
"if i wouldn't do it in a filesystem, i don't do it in a content repository"
(assuming that i view a node as a file or folder)

so let's assume your sequential hex-id is something like "123abc" i would
recommend something like a partitioning for the node structure as follows:
/12/3a/bc which leaves you with 256 child nodes per node.

> Also, what sort of persistence store did you use in these tests? I would
> assume, among other things, that XML is a bad choice, for example :)
i would recommend to use a "bundle persistence manager".

> I have been finding some evidence that people are using jackrabbit in these
> situations successfully, but not a lot of information on how they are
> handling this, backup, etc.
personally, i like to use the derby persistence manager with external
fs based blobs (standard setup). with this setup i do
"hot backups" by just backing up the full repository folder in the filesystem.

some people already have backup&restore facilities in their rdbms which
means that they can setup the persistence manager to store all the
information (blobs, workspace information, nodetypes, etc...) into
the rdbms and leverage their existing backup/restore infrastructure.


View raw message