lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuval Feinstein <yuv...@answers.com>
Subject Different replicas return different scores
Date Tue, 09 Feb 2010 14:26:37 GMT
We are running a large sharded Lucene-based application.
Our configuration supports near real-time updates, by incrementally
Updating documents (using delete then add) on the shards.
Every shard is replicated to several machines in order to improve performance.
We replicate the shard by sending the same deletion and addition commands to all the replicas,
Where they may be performed in a different order. (We delete a set of documents, say 1000
at a time,
Then add them one-by-one semi-asynchronously).
Lately we have noticed a subtle difference in query scores across different replicas of the
same shard.
Further investigation showed that the only noticeable difference between the replicas was
the index directory structure:
1.      Different replicas have different sets of segments - most segment files are the same,
but some are different.
2.      The numbers of deleted documents are different between two replicas of the same shard.
Is this a known behavior of Java Lucene?
How can we change this behavior? We want different replicas returning the exact same score
per query hits.
(We would rather not optimize the index as we believe this will harm performance.)

TIA,
Yuval and Ophir



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message