jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Nuescheler" <david.nuesche...@gmail.com>
Subject Re: workspace / repository scalability
Date Wed, 23 May 2007 14:26:05 GMT
hi cris,

thanks for your email.

we ran a couple of test that are in the lower terabyte range from an overall
data persepctive, but we noticed that the number of nodes and a adequate
hierarchical structure is much more relevant than the overall size of the data.
in our tests we went beyond 50m files (100m nodes) per workspace
without running into substantial issues.

so generally, i think that storing single digital millions of records should
not be an issue at all, and also storing all the data in a single workspace
should be feasible. however, since jackrabbit scales on a
per-workspace basis, you can always split up your data into multiple
workspaces if you should feel like you could reach certain per-workspace
limitations.

regards,
david

On 5/22/07, Cris Daniluk <cris.daniluk@gmail.com> wrote:
> Hello,
>
> I've been considering JackRabbit as a potential replacement for a
> traditional RDBMS content repository we've been using. Currently, the
> metadata is very "static" from document to document. However, we want to
> start bringing in arbitrary types of documents. Each document will specify
> its own metadata, and may map "core" metadata back to a set of common
> fields. It really seems like a natural fit for JCR.
>
> I don't really need search (search services will be provided by a separately
> synchronized and already existing index), but I do need content scalability.
> We have about 500GB worth of binary data and 1GB of associated text metadata
> right now (about 200k records). Ideally, the repository would contain the
> binary data as the primary node, rather than merely referencing it. However,
> this already large data set will probably grow up to 2-3TB in the next year
> and potentially way beyond that, with millions of records.
>
> From browsing the archives, it seems like this would be well above and
> beyond the typical repository size. Has anybody used Jackrabbit with this
> volume of data? It is pretty difficult to set up a test, so I'm left to rely
> on similar user experience. Would clustering, workspace partitioning, etc
> handle the volume we'd be expected to produce?
>
> Thanks for the help,
>
> Cris
>

Mime
View raw message