lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <>
Subject Re: How can we know if 2 lucene indexes are same?
Date Fri, 05 Sep 2008 13:00:48 GMT
On Fri, Sep 5, 2008 at 6:20 PM, Jason Rutherglen
<> wrote:
> In Ocean I had to use a transaction log and execute everything that
> way like SQL database replication.  Then let each node handle it's own
> merging process.  Syncing the indexes is used to get a new node up to
> speed, otherwise it's avoided for the reasons mentioned in the
> previous email.
Propagating the logs is OK but most of Solr users may not need that
feature. They usually update the master in one indexing operation in
which they may add millions of docs and they may commit after that.
Just think about the cost of indexing that many documents on each
slave . It may slow down the responses from live slaves.
Anyway we have not ruled out that approach . That can be added as
another option and we are anyway planning to do that too
> On Fri, Sep 5, 2008 at 8:33 AM, Michael McCandless
> <> wrote:
>> Shalin Shekhar Mangar wrote:
>>> Let me try to explain.
>>> I have a master where indexing is done. I have multiple slaves for
>>> querying.
>>> If I commit+optimize on the master and then rsync the index, the data
>>> transferred on the network is huge. An alternate way is to commit on
>>> master,
>>> transfer the delta to the slave and issue an optimize on the slave. This
>>> is
>>> very fast because less data is transferred on the network.
>> Large segment merges will also send huge traffic.  You may just want to send
>> all updates (document adds/deletes) to all slaves directly?  It'd be nice if
>> you could somehow NOT sync the effects of segment merging, but do sync doc
>> add/deletes... not sure how to do that.
>>> However, we need to ensure that the index on the slave is actually in sync
>>> with the master. So that on another commit, we can blindly transfer the
>>> delta to the slave.
>> I assume your app ensures that no deltas arrive to the slave while it's
>> running optimize?  So then the question becomes (I think) "if two indices
>> are identical to begin with, and I separately run optimize on each, will the
>> resulting two optimized indices be identical?".
>> By "in sync" you also require the final segment name (after optimize) is
>> identical right?
>> I think the answer is yes, but I'm not certain unless I think more about it.
>>  Also this behavior is not "promised" in Lucene's API.
>> Merges for optimize are now allowed to run concurrently (by default, with
>> ConcurrentMergeScheduler), except for the final (< mergeFactor segments)
>> merge, which waits until others have finished.  So if there are 7 obvious
>> merges necessary to optimize, 3 will run concurrently, while 4 wait.  Those
>> 4 then run as the merges finish over time, which may happen in different
>> orders for each index and so different merges may run.  Then the final merge
>> will run and I *think* the net number of merges that ran should always be
>> the same and so the final segment name should be the same.
>> Mike
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

--Noble Paul

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message