jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Scalable Domain
Date Thu, 23 Oct 2008 07:26:39 GMT
Hi,

On Thu, Oct 23, 2008 at 9:10 AM, Michael Wechner
<michael.wechner@wyona.com> wrote:
> Bertrand Delacretaz schrieb:
>> Storing one million nodes should not be a problem, but you'll need to
>> use a structure where no parent node has more than about 10k child
>> nodes.
>
> just being curious, but where does this number come from?

It's a rough estimate I once made based on some ad-hoc experiments I
made. The exact numbers of course vary per platform, but the problems
typically start to manifest themselves somewhere within the 1k-100k
range, with 10k being a good rule of thumb that I like to use when
evaluating designs for content hierarchies.

> AFAIK the filesystems ext2 and ext3 (http://en.wikipedia.org/wiki/Ext3) have
> about 32'000 as limit, so I guess the question is how many files per jackrabbit
> node are needed?

That depends on the persistence manager you use. None of the standard
persistence managers hits the underlying file system limits in terms
of child nodes, the issue is more about the size of the persisted
parent node state. All the persistence managers currently store the
names and identifiers of all the child nodes inside the serialized
node state blob, so the more child node entries you have the slower it
will be to read or write the parent node state. At some point you'll
also start hitting things like BLOB size limits in persistence
databases.

Caching typically takes away some of the pain related to large node
states, but things like adding or removing a child node become
increasingly slower the bigger the parent node is.

BR,

Jukka Zitting

Mime
View raw message