lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <>
Subject Re: managing docids for ParallelReader (was Augmenting an existing index)
Date Wed, 01 Jun 2005 04:35:45 GMT
On 5/31/05, Doug Cutting <> wrote:
> Matt Quail wrote:
> > I have wondered about this as well. Are there any *sure fire* ways of
> > creating (and updating) two indices so that doc numbers in one index
> > deliberately correspond to doc numbers in the other index?
> If you add the documents in the same order to both indexes and perform
> the same deletions on both indexes then they'll have the same numbers.

The Javadoc says that ParallelReader is useful with collections that
have large fields which change rarely and small fields that change
more frequently. IMO that implies that you do *not* always apply the
same operations on both indexes.

> If this is not convenient, then you could add an id field to all
> documents in the primary index.  Then create (or re-create) the
> secondary index by iterating through the values in a FieldCache of this
> id field.

I guess I am too new to Lucene to understand how that is supposed to
work. What exactely is the purpose of a FieldCache and how is it
created and used? Could you elaborate on that, please?

> ParallelReader was not really designed to support incremental updates of
> fields, but rather to accellerate batch updates.  For incremental
> updates you're probably better served by updating a single index.

I would be happy with a single index if it were possible to change
fields of a document without affecting other fields. When I lookup a
document using an IndexSearcher, manipulate some fields and save that
instance using an IndexWriter I lose all fields that were indexed but
not stored. Recreating that fields whenever the ACL of a document
changes is too expensive and is not an option therefore.

> One could define an "acl" IndexReader subclass that generates termDoc
> lists on the fly by looking in an external database.  This would require
> a mapping between Lucene document ids and external document IDs.  A
> FieldCache, as described above, could serve that purpose.

Again, could you elaborate a little more on the FieldCache, please?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message