jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Delacretaz <bdelacre...@apache.org>
Subject Re: Performance of a large number of small nodes
Date Fri, 14 Aug 2009 07:11:46 GMT

On Fri, Aug 14, 2009 at 6:34 AM, Nigel Sim<nigel.sim@gmail.com> wrote:
> ...I am using Jackrabbit to store a mixture of scientific data, which includes
> files and numerical data. The performance of files are fine, but the
> numerical data needs to be extracted as datasets based on attributes such as
> observation time, and this appears to be quite slow in comparison to a
> native DB (obviously). I would really prefer to keep all this related data
> in the same management system, so is there a way to improve the ingestion
> and retrieval of many small nodes?...

Could you take advantage of paths to express the observation time, and
use that for "queries"?

Storing data under paths like /data/2009/12/24/23/02/58 would allow
you to find nodes that belong to a specific day, or hour, by
navigating paths, which might be much more efficient than queries.

> ...My second question, is there an efficient way to query for the latest
> observation? I would assume querying for the node type, sorting, and just
> retrieving the first result?...

Paths would also help here, and you could use observation to keep
track of the path that corresponds to the most recent data, if needed.


View raw message