lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar" <>
Subject Re: How can we know if 2 lucene indexes are same?
Date Fri, 05 Sep 2008 13:19:30 GMT
On Fri, Sep 5, 2008 at 6:03 PM, Michael McCandless <> wrote:

> Large segment merges will also send huge traffic.  You may just want to
> send all updates (document adds/deletes) to all slaves directly?  It'd be
> nice if you could somehow NOT sync the effects of segment merging, but do
> sync doc add/deletes... not sure how to do that.

As Noble said, that is another option we can consider.

I assume your app ensures that no deltas arrive to the slave while it's
> running optimize?  So then the question becomes (I think) "if two indices
> are identical to begin with, and I separately run optimize on each, will the
> resulting two optimized indices be identical?".


> By "in sync" you also require the final segment name (after optimize) is
> identical right?


I think the answer is yes, but I'm not certain unless I think more about it.
>  Also this behavior is not "promised" in Lucene's API.
> Merges for optimize are now allowed to run concurrently (by default, with
> ConcurrentMergeScheduler), except for the final (< mergeFactor segments)
> merge, which waits until others have finished.  So if there are 7 obvious
> merges necessary to optimize, 3 will run concurrently, while 4 wait.  Those
> 4 then run as the merges finish over time, which may happen in different
> orders for each index and so different merges may run.  Then the final merge
> will run and I *think* the net number of merges that ran should always be
> the same and so the final segment name should be the same.

Thanks for the explanation Mike. The core problem is to make sure both
indices are in sync. The log replication helps us because we compare the
master and slave index with a reference point (log position). If it becomes
possible for us to specify a version number during a commit, we can use the
master's version number on the slave. This can help us compare the two
indices. Not sure if that API change will be generally useful. Thoughts?

Shalin Shekhar Mangar.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message