bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Write guarantees and the LAC Protocol
Date Tue, 04 Apr 2017 08:05:13 GMT
2017-04-04 9:47 GMT+02:00 Enrico Olivelli <eolivelli@gmail.com>:

> I have completed by debug session and I have an answer.
>
> The problematic flow is the following:
> - LedgerHandler#asyncAddEntry
> - in case of BKNotEnoughBookiesException I'm triggering an async callback
> (with a CompletableFuture#handleAsync) which in turn calls
> LedgerHandler#close
> - the LedgerHandler#close method truncates the ledger and some entry is
> lost (at the time of the close call I have local LastAddConfirmed ==
> LastAddPushed == 0 on the writer)
>
> Is this a possible explanation ?
> I wonder which is the best way to react to an error (in this case
> BKNotEnoughBookiesException) in asyncAddEntry, usually I'm calling
> LedgerHandler#close in order to the necessary clean up, but in this case it
> seems very dangerous
> (I am running BK 4.4.0)
>
> This is the issue on my project, where I have put logs
> https://github.com/diennea/herddb/issues/100
>
> Thanks
> -- Enrico
>
>
>

I'm sorry, during other test cases skipping the close() did not fix the
issue consistently, I will debug more


>
>
>
>
>
>
> 2017-03-17 16:14 GMT+01:00 Enrico Olivelli <eolivelli@gmail.com>:
>
>>
>>
>> 2017-03-17 7:48 GMT+01:00 Sijie Guo <guosijie@gmail.com>:
>>
>>>
>>>
>>> On Wed, Mar 15, 2017 at 9:17 AM, Enrico Olivelli <eolivelli@gmail.com>
>>> wrote:
>>>
>>>> Hi BookKeepers,
>>>> I'm trying to understand some basic aspect of BookKeeper....
>>>>
>>>> I have a strange case, and I would like to have some opinion from any
>>>> "expert".
>>>>
>>>> I am using BookKeeper as a transaction log for a database, this is an
>>>> example scenario:
>>>>
>>>> 1) the user inserts a record in a table
>>>> the "fact" is logged to the Write-Ahead-log (Bookkeeper ledger), and
>>>> the systems blocks until the asyncAddEntry completes (the callback is
>>>> called with rc=0)
>>>>
>>>> now I (the 'writer') assume that the entry is written to durable
>>>> storage and has been synched to disk
>>>> then the DB applies the change to the local memory as every application
>>>> with a WAL does
>>>>
>>>> 2) The writer 2  other entries for which the asyncAddEntry completes
>>>> (the callback is called with rc=0). I block until the callback as been
>>>> called
>>>>
>>>> 2) for tests I'm using an ensemble of 1 bookie, I "stop" the only bookie
>>>> the bookie stops "gracefully" (no strange errors)
>>>>
>>>> 3) The writer issues other asyncAddEntry, which fail due to
>>>> "notEnoughBookiesException" and then "closes" the LedgerHandle, inside the
>>>> "callback" cade
>>>>
>>>> 4) I restart the bookie (no strange errors on logs)
>>>>
>>>> 5) I restart my "writer", opening the ledger with "openLedger" (with
>>>> recovery and fencing...) in order to perform recovery from the WAL
>>>>
>>>> 6) now the LastAddConfirmed is 0 ! so my first entry is lost !
>>>>
>>>> Am I missing something ?
>>>> The LAC has not been advanced even with 3 successful writes to the
>>>> ledger
>>>>
>>>
>>> Are you using readLastAddConfirmed at step 5)? 0 is expected when you
>>> using #readLastAddConfirmed, since the call is retrieving LAC from bookies.
>>>
>>>
>> I have added a call to readLastAddConfirmed but it does not help, the LAC
>> is still 0
>>
>>
>> I have double checked my code and the "close" happens in another thread,
>> maybe I am hitting a race condition in my code, in fact the problem is not
>> reproducible at 100%
>>
>>
>> I will continue debugging
>>
>> Thanks
>>
>>
>>
>>
>>> At step 3), it already closed the ledger and update the metadata's last
>>> entry id with 2. so for tailing read, you might need to call
>>> #readLastAddConfirmed and then get the last add confirmed using
>>> #getLastAddConfirmed.
>>>
>>>
>>>
>>>>
>>>> From the "writer" point-of-view the fact that the callback as been
>>>> called is not a guarantee that the entry as been written to durable storage
>>>> ?
>>>>
>>>> I understand that of a "tailing" reader (a ledger opened with
>>>> openLedgerNoRecovery) it is very important not to be able to "read" data
>>>> before the LAC, but for the "writer" it is important that it can count on
>>>> the fact the entry will be recovered with success
>>>>
>>>>
>>>> Thanks
>>>> -- Enrico
>>>>
>>>>
>>>
>>
>

Mime
View raw message