lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <>
Subject Re: how to rebuild a index corrupted?
Date Thu, 23 Mar 2017 09:53:04 GMT
Errata corridge/integration for questions related to previous my post

I studied a bit this lucene classes for understanding:
1) setCommitData is designed for versioning the index , not for passing a
transaction log. However if userdata is different for every transactionid
it is equivalent .
2) NRT refresh automatically searcher/reader it dont call commit. I based
my implementation using nrt on
In this example commit is executed for every crud operation in synchronous
way but in general it is advised to use a batch thread because the commit
is a long operation. *So it is not clear how to do the commit in a
near-real time system with a indefinite index size.*
     2.a if the commit is synchronous , i can use user data because it is
used before a commit, every commit has a different user data and i can
trace the transactions changes.But in general a commit can requires also
minutes for be completed so then it dont seams a real solution in a near
real time solution.
    2.b if the commit is async, it is executed every X times (or better how
memory if full) , the commit can not be used for tracing the transactions
and i can pass a trnsaction id associated with a lucene commit. I can add a
mutex in crud ( when i loading uncommit data) i m sure the last uncummit
Index is aligned to the last transaction id X, so there is no overlappind
and the crud block is very fast when happens.But how to grant that the
commit is related to the last CommitIndex what i loaded? Maybe if i
introduce that mutex in a custom mergePolicy?
It is right what i wrote until now ?The best solution is 2.b? In this case
how to grant the commit is done based on the uncommit data loaded in a
specific commitIndex?

2017-03-22 15:32 GMT+01:00 Michael McCandless <>:

> Hi, I think you forgot to CC the lucene user's list (
> in your reply?  Can you resend?
> Thanks.
> Mike McCandless
> On Wed, Mar 22, 2017 at 9:02 AM, Cristian Lorenzetto <
>> wrote:
>> hi , i m thinking about what you told me in previous message and how to
>> solve the corruption problem and the problem about commit operation
>> executed in async way.
>> I m thinking to create a simple transaction log in a file.
>> i use a long atomic sequence for a ordinable transaction id.
>> when i make a new operation
>> 1) generate new incremental transaction id
>> 2) save the operation abstract info in transaction log associated to id.
>>     2.a insert ,update with the a serialized version of the object to
>> save
>>     2b delete the query serialized where apply delete
>> 3) execute same operation in lucene adding before property transactionId
>> (executed in ram)
>> 4) in async way commit is executed. After the commit the transaction log
>> until last transaction id is deleted.(i dont know how insert block after
>> commit , using near real time reader and SearcherManager) I might
>>  introduce a logic in the way a commit is done. The order is simlilar to a
>> queue so it follows the transactionId order. i Is there a example about
>> possibility to commit a specific set of uncommit operations?
>> 5) i need the warrenty after a crud operation the data in available in
>> memory  in a possible imminent research so i think i might execute
>> flush/refreshReader after every CUD operations
>> if there is a failure transaction log will be not empty. But i can
>> rexecute operations not executed after restartup.
>> Maybe it could be usefull also for fixing a corruption but it is sure the
>> corrution dont touch also segments already commited completely in the past?
>> or maybe for a stable solution i might anyway save data in a secondary
>> repository ?
>> for your opinion this solution will be sufficient . It is a good solution
>> for you, i m forgetting some aspects?
>> PS Another interesting aspect maybe could be associate the segment
>> associated to a transaction. In this way if a segment is missing i can
>> apply again it without rebuild all the index from scratch.
>> 2017-03-21 0:58 GMT+01:00 Michael McCandless <>:
>>> You can use Lucene's CheckIndex tool with the -exorcise option but this
>>> is quite brutal: it simply drops any segment that has corruption it detects.
>>> Mike McCandless
>>> On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis <> wrote:
>>>> I'm afraid it's not possible to rebuild index. It's important to
>>>> maintain a
>>>> backup policy because of that.
>>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto <
>>>>> wrote:
>>>> > lucene can rebuild index using his internal info and how ? or in have
>>>> to
>>>> > reinsert all in other way?
>>>> >
>>>> --
>>>> Marco Reis
>>>> Software Architect
>>>> +55 61 9 81194620

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message