lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "叶双明" <yeshuangm...@gmail.com>
Subject Re: How can we know if 2 lucene indexes are same?
Date Fri, 05 Sep 2008 13:17:24 GMT
There is more and more complex, actually I hava a small index system can
config multiple index server for query,

In my opinion,  because  index update operating is synchronized between
different Thread that update the index, so

for indexing new data : can process data that want to index at the master,
when get the documents, add the documents to the index at the master and add
them to every slave,

for deleting data : delete at the master and every slaves at the same time,

I think we can believe  the index is indeed the same at the master and at
all slaves except other unexpected error in individual node.

And i hear about there is some frame to sync data between computers, but
just hear about.

Sorry for my englist. :)




2008/9/5, Michael McCandless <lucene@mikemccandless.com>:
>
>
> Shalin Shekhar Mangar wrote:
>
> Let me try to explain.
>>
>> I have a master where indexing is done. I have multiple slaves for
>> querying.
>>
>> If I commit+optimize on the master and then rsync the index, the data
>> transferred on the network is huge. An alternate way is to commit on
>> master,
>> transfer the delta to the slave and issue an optimize on the slave. This
>> is
>> very fast because less data is transferred on the network.
>>
>
> Large segment merges will also send huge traffic.  You may just want to
> send all updates (document adds/deletes) to all slaves directly?  It'd be
> nice if you could somehow NOT sync the effects of segment merging, but do
> sync doc add/deletes... not sure how to do that.
>
> However, we need to ensure that the index on the slave is actually in sync
>> with the master. So that on another commit, we can blindly transfer the
>> delta to the slave.
>>
>
> I assume your app ensures that no deltas arrive to the slave while it's
> running optimize?  So then the question becomes (I think) "if two indices
> are identical to begin with, and I separately run optimize on each, will the
> resulting two optimized indices be identical?".
>
> By "in sync" you also require the final segment name (after optimize) is
> identical right?
>
> I think the answer is yes, but I'm not certain unless I think more about
> it.  Also this behavior is not "promised" in Lucene's API.
>
> Merges for optimize are now allowed to run concurrently (by default, with
> ConcurrentMergeScheduler), except for the final (< mergeFactor segments)
> merge, which waits until others have finished.  So if there are 7 obvious
> merges necessary to optimize, 3 will run concurrently, while 4 wait.  Those
> 4 then run as the merges finish over time, which may happen in different
> orders for each index and so different merges may run.  Then the final merge
> will run and I *think* the net number of merges that ran should always be
> the same and so the final segment name should be the same.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message