lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <>
Subject Re: how to rebuild a index corrupted?
Date Thu, 23 Mar 2017 13:56:04 GMT
In the flow of the thinking ... i added a explanation for evoiding
misunderstanding. I use  TransactionId not for introduce transaction in
lucene (a async commit excludes a traditional transaction system) but for
signing segments with a extenal key (transactionid) , so if for a
corruption error in index i cant find a segment 5 , searching segment 4 and
6 i can understand the range of foreign keys (transaction ids) to reload in
lucene. So i can load in lucene all the documents missing realoding them
for example from a database.

2017-03-23 10:53 GMT+01:00 Cristian Lorenzetto <>:

> Errata corridge/integration for questions related to previous my post
> I studied a bit this lucene classes for understanding:
> 1) setCommitData is designed for versioning the index , not for passing a
> transaction log. However if userdata is different for every transactionid
> it is equivalent .
> 2) NRT refresh automatically searcher/reader it dont call commit. I based
> my implementation using nrt on
> questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread
> -sample-usage. In this example commit is executed for every crud
> operation in synchronous way but in general it is advised to use a batch
> thread because the commit is a long operation. *So it is not clear how to
> do the commit in a near-real time system with a indefinite index size.*
>      2.a if the commit is synchronous , i can use user data because it is
> used before a commit, every commit has a different user data and i can
> trace the transactions changes.But in general a commit can requires also
> minutes for be completed so then it dont seams a real solution in a near
> real time solution.
>     2.b if the commit is async, it is executed every X times (or better
> how memory if full) , the commit can not be used for tracing the
> transactions and i can pass a trnsaction id associated with a lucene
> commit. I can add a mutex in crud ( when i loading uncommit data) i m sure
> the last uncummit Index is aligned to the last transaction id X, so there
> is no overlappind and the crud block is very fast when happens.But how to
> grant that the commit is related to the last CommitIndex what i loaded?
> Maybe if i introduce that mutex in a custom mergePolicy?
> It is right what i wrote until now ?The best solution is 2.b? In this case
> how to grant the commit is done based on the uncommit data loaded in a
> specific commitIndex?
> 2017-03-22 15:32 GMT+01:00 Michael McCandless <>:
>> Hi, I think you forgot to CC the lucene user's list (
>> in your reply?  Can you resend?
>> Thanks.
>> Mike McCandless
>> On Wed, Mar 22, 2017 at 9:02 AM, Cristian Lorenzetto <
>>> wrote:
>>> hi , i m thinking about what you told me in previous message and how to
>>> solve the corruption problem and the problem about commit operation
>>> executed in async way.
>>> I m thinking to create a simple transaction log in a file.
>>> i use a long atomic sequence for a ordinable transaction id.
>>> when i make a new operation
>>> 1) generate new incremental transaction id
>>> 2) save the operation abstract info in transaction log associated to id.
>>>     2.a insert ,update with the a serialized version of the object to
>>> save
>>>     2b delete the query serialized where apply delete
>>> 3) execute same operation in lucene adding before property transactionId
>>> (executed in ram)
>>> 4) in async way commit is executed. After the commit the transaction log
>>> until last transaction id is deleted.(i dont know how insert block after
>>> commit , using near real time reader and SearcherManager) I might
>>>  introduce a logic in the way a commit is done. The order is simlilar to a
>>> queue so it follows the transactionId order. i Is there a example about
>>> possibility to commit a specific set of uncommit operations?
>>> 5) i need the warrenty after a crud operation the data in available in
>>> memory  in a possible imminent research so i think i might execute
>>> flush/refreshReader after every CUD operations
>>> if there is a failure transaction log will be not empty. But i can
>>> rexecute operations not executed after restartup.
>>> Maybe it could be usefull also for fixing a corruption but it is sure
>>> the corrution dont touch also segments already commited completely in the
>>> past? or maybe for a stable solution i might anyway save data in a
>>> secondary repository ?
>>> for your opinion this solution will be sufficient . It is a good
>>> solution for you, i m forgetting some aspects?
>>> PS Another interesting aspect maybe could be associate the segment
>>> associated to a transaction. In this way if a segment is missing i can
>>> apply again it without rebuild all the index from scratch.
>>> 2017-03-21 0:58 GMT+01:00 Michael McCandless <>
>>> :
>>>> You can use Lucene's CheckIndex tool with the -exorcise option but this
>>>> is quite brutal: it simply drops any segment that has corruption it detects.
>>>> Mike McCandless
>>>> On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis <> wrote:
>>>>> I'm afraid it's not possible to rebuild index. It's important to
>>>>> maintain a
>>>>> backup policy because of that.
>>>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto <
>>>>>> wrote:
>>>>> > lucene can rebuild index using his internal info and how ? or in
>>>>> have to
>>>>> > reinsert all in other way?
>>>>> >
>>>>> --
>>>>> Marco Reis
>>>>> Software Architect
>>>>> +55 61 9 81194620

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message