lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "叶双明" <>
Subject Re: How can we know if 2 lucene indexes are same?
Date Fri, 05 Sep 2008 13:32:06 GMT
Just think about the cost of indexing that many documents on each
slave . It may slow down the responses from live slaves.

I think there must be something like search service at the slaves incude a
IndexSearcher or other equals object, and indexing that many documents by a
IndexWriter , isn't the IndexSearcher affected by the indexing process?
After the indexing, reopen the IndexSearcher to load the new data.

2008/9/5, 叶双明 <>:
> There is more and more complex, actually I hava a small index system can
> config multiple index server for query,
> In my opinion,  because  index update operating is synchronized between
> different Thread that update the index, so
> for indexing new data : can process data that want to index at the master,
> when get the documents, add the documents to the index at the master and add
> them to every slave,
> for deleting data : delete at the master and every slaves at the same time,
> I think we can believe  the index is indeed the same at the master and at
> all slaves except other unexpected error in individual node.
> And i hear about there is some frame to sync data between computers, but
> just hear about.
> Sorry for my englist. :)
> 2008/9/5, Michael McCandless <>:
>> Shalin Shekhar Mangar wrote:
>> Let me try to explain.
>>> I have a master where indexing is done. I have multiple slaves for
>>> querying.
>>> If I commit+optimize on the master and then rsync the index, the data
>>> transferred on the network is huge. An alternate way is to commit on
>>> master,
>>> transfer the delta to the slave and issue an optimize on the slave. This
>>> is
>>> very fast because less data is transferred on the network.
>> Large segment merges will also send huge traffic.  You may just want to
>> send all updates (document adds/deletes) to all slaves directly?  It'd be
>> nice if you could somehow NOT sync the effects of segment merging, but do
>> sync doc add/deletes... not sure how to do that.
>> However, we need to ensure that the index on the slave is actually in sync
>>> with the master. So that on another commit, we can blindly transfer the
>>> delta to the slave.
>> I assume your app ensures that no deltas arrive to the slave while it's
>> running optimize?  So then the question becomes (I think) "if two indices
>> are identical to begin with, and I separately run optimize on each, will the
>> resulting two optimized indices be identical?".
>> By "in sync" you also require the final segment name (after optimize) is
>> identical right?
>> I think the answer is yes, but I'm not certain unless I think more about
>> it.  Also this behavior is not "promised" in Lucene's API.
>> Merges for optimize are now allowed to run concurrently (by default, with
>> ConcurrentMergeScheduler), except for the final (< mergeFactor segments)
>> merge, which waits until others have finished.  So if there are 7 obvious
>> merges necessary to optimize, 3 will run concurrently, while 4 wait.  Those
>> 4 then run as the merges finish over time, which may happen in different
>> orders for each index and so different merges may run.  Then the final merge
>> will run and I *think* the net number of merges that ran should always be
>> the same and so the final segment name should be the same.
>> Mike
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message