jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: ids and locality
Date Wed, 12 Jan 2011 10:59:52 GMT

On Wed, Jan 12, 2011 at 11:40 AM, Thomas Mueller <mueller@adobe.com> wrote:
> For specialized use cases this might work. In-memory repositories are
> nice, specially for testing. They are very fast. But for normal uses
> cases, this would be expensive. For a regular repository, one million
> nodes is not that much. Even 10 million nodes is not a lot depending on
> the use case. If we keep secondary indexes in the repository, the number
> of nodes will further increase.

If the memory cost is too big, you could also use an LRU mechanism to
only keep the most recently accessed identifiers indexed in memory.
And to avoid having to read too many non-cached index blocks during a
cache miss, you could use a Bloom filter to predict whether an index
entry is found within a given block.

> Even if the index is kept in memory, it still needs to be persisted once
> in a while. If you have an append-only storage, then you can afford only
> storing the index after x changes, because to re-build the index, you can
> use an old index and re-apply at most x-1 changes. But without append-only
> storage, re-building the index from the data takes time (linear to the
> size of the repository, or possibly linear to the size of the repository
> including old versions).

I think an append-only or at least journaled storage is pretty much a
requirement for any modern media that benefits from locality of access
(and thus is relevant to this discussion), so it should be no problem
to write incremental index updates only occasionally.


Jukka Zitting

View raw message