lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: how to rebuild a index corrupted?
Date Thu, 23 Mar 2017 13:59:00 GMT
You should be able to use the sequence numbers returned by IndexWriter
operations to "know" which operations made it into the commit and which did
not, and then on disaster recovery replay only those operations that didn't
make it?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 23, 2017 at 5:53 AM, Cristian Lorenzetto <
cristian.lorenzetto@gmail.com> wrote:

> Errata corridge/integration for questions related to previous my post
>
> I studied a bit this lucene classes for understanding:
> 1) setCommitData is designed for versioning the index , not for passing a
> transaction log. However if userdata is different for every transactionid
> it is equivalent .
> 2) NRT refresh automatically searcher/reader it dont call commit. I based
> my implementation using nrt on http://stackoverflow.com/
> questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread
> -sample-usage. In this example commit is executed for every crud
> operation in synchronous way but in general it is advised to use a batch
> thread because the commit is a long operation. *So it is not clear how to
> do the commit in a near-real time system with a indefinite index size.*
>      2.a if the commit is synchronous , i can use user data because it is
> used before a commit, every commit has a different user data and i can
> trace the transactions changes.But in general a commit can requires also
> minutes for be completed so then it dont seams a real solution in a near
> real time solution.
>     2.b if the commit is async, it is executed every X times (or better
> how memory if full) , the commit can not be used for tracing the
> transactions and i can pass a trnsaction id associated with a lucene
> commit. I can add a mutex in crud ( when i loading uncommit data) i m sure
> the last uncummit Index is aligned to the last transaction id X, so there
> is no overlappind and the crud block is very fast when happens.But how to
> grant that the commit is related to the last CommitIndex what i loaded?
> Maybe if i introduce that mutex in a custom mergePolicy?
> It is right what i wrote until now ?The best solution is 2.b? In this case
> how to grant the commit is done based on the uncommit data loaded in a
> specific commitIndex?
>
>
>
>
>
> 2017-03-22 15:32 GMT+01:00 Michael McCandless <lucene@mikemccandless.com>:
>
>> Hi, I think you forgot to CC the lucene user's list (
>> java-user@lucene.apache.org) in your reply?  Can you resend?
>>
>> Thanks.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Mar 22, 2017 at 9:02 AM, Cristian Lorenzetto <
>> cristian.lorenzetto@gmail.com> wrote:
>>
>>> hi , i m thinking about what you told me in previous message and how to
>>> solve the corruption problem and the problem about commit operation
>>> executed in async way.
>>>
>>> I m thinking to create a simple transaction log in a file.
>>> i use a long atomic sequence for a ordinable transaction id.
>>>
>>> when i make a new operation
>>> 1) generate new incremental transaction id
>>> 2) save the operation abstract info in transaction log associated to id.
>>>     2.a insert ,update with the a serialized version of the object to
>>> save
>>>     2b delete the query serialized where apply delete
>>> 3) execute same operation in lucene adding before property transactionId
>>> (executed in ram)
>>>
>>> 4) in async way commit is executed. After the commit the transaction log
>>> until last transaction id is deleted.(i dont know how insert block after
>>> commit , using near real time reader and SearcherManager) I might
>>>  introduce a logic in the way a commit is done. The order is simlilar to a
>>> queue so it follows the transactionId order. i Is there a example about
>>> possibility to commit a specific set of uncommit operations?
>>>
>>> 5) i need the warrenty after a crud operation the data in available in
>>> memory  in a possible imminent research so i think i might execute
>>> flush/refreshReader after every CUD operations
>>>
>>> if there is a failure transaction log will be not empty. But i can
>>> rexecute operations not executed after restartup.
>>> Maybe it could be usefull also for fixing a corruption but it is sure
>>> the corrution dont touch also segments already commited completely in the
>>> past? or maybe for a stable solution i might anyway save data in a
>>> secondary repository ?
>>>
>>>
>>>
>>> for your opinion this solution will be sufficient . It is a good
>>> solution for you, i m forgetting some aspects?
>>>
>>> PS Another interesting aspect maybe could be associate the segment
>>> associated to a transaction. In this way if a segment is missing i can
>>> apply again it without rebuild all the index from scratch.
>>>
>>> 2017-03-21 0:58 GMT+01:00 Michael McCandless <lucene@mikemccandless.com>
>>> :
>>>
>>>> You can use Lucene's CheckIndex tool with the -exorcise option but this
>>>> is quite brutal: it simply drops any segment that has corruption it detects.
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis <ma@marcoreis.net> wrote:
>>>>
>>>>> I'm afraid it's not possible to rebuild index. It's important to
>>>>> maintain a
>>>>> backup policy because of that.
>>>>>
>>>>>
>>>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto <
>>>>> cristian.lorenzetto@gmail.com> wrote:
>>>>>
>>>>> > lucene can rebuild index using his internal info and how ? or in
>>>>> have to
>>>>> > reinsert all in other way?
>>>>> >
>>>>> --
>>>>> Marco Reis
>>>>> Software Architect
>>>>> http://marcoreis.net
>>>>> https://github.com/masreis
>>>>> +55 61 9 81194620
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message