jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torgeir Veimo <torg...@netenviron.com>
Subject Re: Suitability of jackrabbit for requirements [SEC=UNCLASSIFIED]
Date Wed, 09 Dec 2009 05:27:49 GMT
2009/12/9  <Ross.Dyson@ipaustralia.gov.au>:
>
> I have a requirement to archive several million documents with variable
> metadata (dates, types of document). [...] Documents will be retrieved by their
> position in the tree, plus some filtering on the metadata.

Note that putting thousands of documents as children of a node is
discouraged. You can use a subtree of first letter approach, eg. by
document name, using node paths such as;

<prefix>/a/b/abstract.pdf

or simply use partial hashes of the file name;

<prefix>/2b/6f/abstract.pdf

> Couple million docs, terabyte of data, modest throughput, availability of
> upgrade path, easy backups.  From my reading/lurking, this sounds like a job
> for bundle persistence manager using H2 database.
>
> Have I missed something?  Does this sound like a workable plan?

Sure. But you might want to consider using a DataStore with the
persistence manager. Also, with jackrabbit 1.6 and up, it's easy
(although slow) to move from one repository config to another with the
migration tool, if you find you need to reconfigure.

-- 
-Tor

Mime
View raw message