bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <si...@apache.org>
Subject Re: Ledgers failing to replicate
Date Thu, 12 Jan 2017 18:20:07 GMT
On Thu, Jan 12, 2017 at 9:12 AM, Sebastián Schepens <
sebastian.schepens@mercadolibre.com> wrote:

> Pulsar is creating the ledgers. This ledgers should get written when a
> topic receives messages I guess, but these must be idle topics. Ledgers
> should be closed no errors or once every period of time to allow rotation.
>
> The default grace period for open ledgers is 30s, theoretically, but
> shouldn't clients close ledgers when a node disconnects?
>

Ideally I think pulsar broker should close ledgers periodically on
rotations. I guess probably there are idle topics, so the ledgers used for
those topics are empty and not closed by pulsar broker.


> Perhaps this is not happening because the ledgers aren't currently being
> written?
> Even so, ledgers in the grace period should be listed as underreplicated,
> shouldn't them? As I've said, before turning off another bookie I waited
> till all ledgers were replicated.
>

So there is a logic in auto recovery:

when it detects a ledger is missing bookies, it will mark it as
under-replicated and the replication worker (which is the auto recovery
daemon) will start replicate those under-replicated ledgers. If there are
open ledgers, it doesn't replicate the ledger immediately. It defers the
action in openLedgerRereplicationGracePeriod period (which is the 30
seconds). After openLedgerRereplicationGracePeriod period, it forces
fencing the ledger and releases the lock for replicating this ledger. so
that this ledger can be replicated later by any replication worker.

In theory, if you waited until all ledgers were replicated (means no
ledgers are marked as under-replicated), those ledgers should already
successfully be re-replicated.

There is one possibility that I can think of - there are ledgers created
after the auditor of auto recovery audits all the existing ledgers. What is
your auditorPeriodicBookieCheckInterval ?

- Sijie


>
> Thanks for the tip on the quorums!
>
> Sebastian
>
> On Thu, Jan 12, 2017 at 1:31 PM Sijie Guo <sijie@apache.org> wrote:
>
>> I see. Let me ask one more questions - how do you create ledgers? And
>> when do you write these ledgers and when do you close them.
>>
>> I think they are probably just empty ledgers at the time you were
>> rolling. There is a setting in the recovery tool to force close the open
>> ledgers. I need to check and confirm that.
>>
>>
>>
>> On Jan 12, 2017 6:14 AM, "Sebastián Schepens" <sebastian.schepens@
>> mercadolibre.com> wrote:
>>
>> Sijie,
>> We were replacing all our nodes and testing how to do it best without
>> affecting the cluster.
>>
>> This same thing happened again yesterday. I have 4 underreplicated
>> ledgers, which are empty.
>> But this time, I turned off bookies on by one, and waiting for all
>> underreplicated ledgers to replicate before turning off another bookie.
>> Even while doing this 'rolling' replace, I ended up with inconsistent
>> ledgers. How can this be possible?
>> One would expect that when there are no underreplicated ledgers, it would
>> be safe to loose a machine.
>>
>> What's the recommended quorum setup if I wanted to safely tolerate 2
>> machine failure?
>>
>>
>> If you want to tolerate 2 failures, you need to write quorum size - ack
>> quorum size to be larger than or equal to 2.
>>
>>
>> Thanks,
>> Sebastian
>>
>>
>> On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <sijie@apache.org> wrote:
>>
>> On Wed, Jan 11, 2017 at 11:15 AM, Sebastián Schepens <sebastian.schepens@
>> mercadolibre.com> wrote:
>>
>> Hi guys,
>> I'm doing some tests and turned off 2 bookies almost simultaneously
>> hoping that all the ledgers would still be able to replicate since we have
>> ensemble and quorum size of 3.
>> Almost all ledgers managed to replicate using the autorecovery daemon
>> except for 5. What's curious about this 5 ledgers is that they are all
>> empty and the only node which contains data for it claims it does not exist.
>>
>> Here's the ledger metadata for one of them:
>> ledgerID: 772
>> BookieMetadataFormatVersion 2
>> quorumSize: 3
>> ensembleSize: 3
>> length: 0
>> lastEntryId: -1
>> state: IN_RECOVERY
>> segment {
>>   ensembleMember: "10.64.103.57:3181"
>>   ensembleMember: "10.64.103.249:3181"
>>   ensembleMember: "10.64.102.95:3181"
>>   firstEntryId: 0
>> }
>> digestType: CRC32
>> password: ""
>> ackQuorumSize: 2
>>
>> Where all nodes except 10.64.103.249 are down.
>>
>> And that node contains these logs:
>> ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No
>> ledger found while reading entry:-1 from ledger: 772
>>
>>
>> They seem to be empty ledgers with no entries.
>>
>>
>>
>> I don't understand how these ledgers ended in this state, is it
>> recoverable?
>>
>>
>> If the ledgers are closed, if you lose two bookies, the re-replication
>> can replicate the data correctly. As when the ledger is in closed state, it
>> will contains the last entry id in the metadata, it would use the
>> information to determine the state of the ledger and replicate data
>> correctly.
>>
>> However, if the ledgers are open and you lost two bookies (which is the
>> majority of your quorum), the client can't make a decision what is the last
>> entry id based on only one left bookie, so it can not close/seal the ledger
>> correctly.
>>
>> Can you explain more about your tests? It would help me understand more
>> about that.
>>
>>
>>
>> I could just delete the ledgers cause they are empty too. By the way,
>> bookkeeper shell should have a command for deleting ledgers.
>>
>>
>> Yeah, this is a good suggestion. Do you mind creating a jira for adding
>> the delete ledger command?
>>
>>
>>
>> Thanks,
>> Sebastian
>>
>>
>>
>>

Mime
View raw message