lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: updateDocument question
Date Wed, 06 Feb 2013 16:13:32 GMT
Hi Thomas,

On Wed, Feb 6, 2013 at 2:50 PM, Becker, Thomas <Thomas.Becker@netapp.com> wrote:
> I've built a search prototype feature for my application using Lucene, and it works great.
 The application monitors a remote system and currently indexes just a few core attributes
of the objects on that system.  I get notifications when objects change, and I then update
the Lucene index to keep things in sync.   The thing is that even when objects on the remote
system are updated, it's relatively unlikely that the specific attributes I'm indexing (like
name) were changed.  From what I can see, IndexWriter.updateDocument() makes no effort to
determine if the existing document is actually dirty compared to the provided one.  My questions
are:
>
> Is this true that documents are assumed to be changed and not actually checked before
replacement?

Yes, it's true.

> Has such a feature been considered?

I'm not sure but I see several issues: For example if you reindex the
exact same document with a different analyzer, the index
terms/positions/offsets/payloads might be different. Moreover, one can
only perform such a comparison if the document is stored, which is
something that Lucene doesn't enforce.

> Is it worth it to query for the document, manually dirty check it and then delete/re-add
only if it's different if changes to the indexed fields are relatively uncommon?  My concern
is that I'm inadvertently causing a lot of segment churn for things that aren't actually changing.

You could try to do it, but maybe it is just fine the way it is: as
segments get merged deleted docs eventually get expunged.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message