bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <si...@apache.org>
Subject Re: Ledgers failing to replicate
Date Thu, 12 Jan 2017 20:04:27 GMT
On Thu, Jan 12, 2017 at 10:33 AM, Sebastián Schepens <
sebastian.schepens@mercadolibre.com> wrote:

> On Thu, Jan 12, 2017 at 3:20 PM Sijie Guo <sijie@apache.org> wrote:
>
>> On Thu, Jan 12, 2017 at 9:12 AM, Sebastián Schepens <sebastian.schepens@
>> mercadolibre.com> wrote:
>>
>> Pulsar is creating the ledgers. This ledgers should get written when a
>> topic receives messages I guess, but these must be idle topics. Ledgers
>> should be closed no errors or once every period of time to allow rotation.
>>
>> The default grace period for open ledgers is 30s, theoretically, but
>> shouldn't clients close ledgers when a node disconnects?
>>
>>
>> Ideally I think pulsar broker should close ledgers periodically on
>> rotations. I guess probably there are idle topics, so the ledgers used for
>> those topics are empty and not closed by pulsar broker.
>>
>
> I'm gonna see if I can find this logic.
>
>
>>
>>
>> Perhaps this is not happening because the ledgers aren't currently being
>> written?
>> Even so, ledgers in the grace period should be listed as underreplicated,
>> shouldn't them? As I've said, before turning off another bookie I waited
>> till all ledgers were replicated.
>>
>>
>> So there is a logic in auto recovery:
>>
>> when it detects a ledger is missing bookies, it will mark it as
>> under-replicated and the replication worker (which is the auto recovery
>> daemon) will start replicate those under-replicated ledgers. If there are
>> open ledgers, it doesn't replicate the ledger immediately. It defers the
>> action in openLedgerRereplicationGracePeriod period (which is the 30
>> seconds). After openLedgerRereplicationGracePeriod period, it forces
>> fencing the ledger and releases the lock for replicating this ledger. so
>> that this ledger can be replicated later by any replication worker.
>>
>> In theory, if you waited until all ledgers were replicated (means no
>> ledgers are marked as under-replicated), those ledgers should already
>> successfully be re-replicated.
>>
>
> This is precisely what I thought.
>
>
>> There is one possibility that I can think of - there are ledgers created
>> after the auditor of auto recovery audits all the existing ledgers. What is
>> your auditorPeriodicBookieCheckInterval ?
>>
>
> But, wouldn't ledgers created after the audit exclude the failed bookie? I
> mean, the audit started because a node went down, new ledgers should
> exclude that node.
>

That is correct. One possibility is the pulsar broker detects the failed
bookie later than auditor detects it. One simple thing to try to confirm if
it is this case: can you do the rolling replacement when there is no
traffic?




> We have auditorPeriodicBookieCheckInterval at the default which 86400
> seconds. I understand that running this check very often could bring issues
> as it stresses zookeeper a lot.
>

Never mind at this part. The auditor will start auditing when detecting a
bookie is lost from zookeeper.



>
> Another question about quorums, say I have 3 write quorum and 3 ack
> quorum, that would theoretically be able to handle a loss of 2 nodes as
> well, wouldn't it?
>

Ah, you are right. My comment in previous email is wrong - it should be ack
quorum size larger than the num of failures.


>
> Thanks,
> Sebastian
>
>
>> - Sijie
>>
>>
>>
>> Thanks for the tip on the quorums!
>>
>> Sebastian
>>
>> On Thu, Jan 12, 2017 at 1:31 PM Sijie Guo <sijie@apache.org> wrote:
>>
>> I see. Let me ask one more questions - how do you create ledgers? And
>> when do you write these ledgers and when do you close them.
>>
>> I think they are probably just empty ledgers at the time you were
>> rolling. There is a setting in the recovery tool to force close the open
>> ledgers. I need to check and confirm that.
>>
>>
>>
>> On Jan 12, 2017 6:14 AM, "Sebastián Schepens" <sebastian.schepens@
>> mercadolibre.com> wrote:
>>
>> Sijie,
>> We were replacing all our nodes and testing how to do it best without
>> affecting the cluster.
>>
>> This same thing happened again yesterday. I have 4 underreplicated
>> ledgers, which are empty.
>> But this time, I turned off bookies on by one, and waiting for all
>> underreplicated ledgers to replicate before turning off another bookie.
>> Even while doing this 'rolling' replace, I ended up with inconsistent
>> ledgers. How can this be possible?
>> One would expect that when there are no underreplicated ledgers, it would
>> be safe to loose a machine.
>>
>> What's the recommended quorum setup if I wanted to safely tolerate 2
>> machine failure?
>>
>>
>> If you want to tolerate 2 failures, you need to write quorum size - ack
>> quorum size to be larger than or equal to 2.
>>
>>
>> Thanks,
>> Sebastian
>>
>>
>> On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <sijie@apache.org> wrote:
>>
>> On Wed, Jan 11, 2017 at 11:15 AM, Sebastián Schepens <sebastian.schepens@
>> mercadolibre.com> wrote:
>>
>> Hi guys,
>> I'm doing some tests and turned off 2 bookies almost simultaneously
>> hoping that all the ledgers would still be able to replicate since we have
>> ensemble and quorum size of 3.
>> Almost all ledgers managed to replicate using the autorecovery daemon
>> except for 5. What's curious about this 5 ledgers is that they are all
>> empty and the only node which contains data for it claims it does not exist.
>>
>> Here's the ledger metadata for one of them:
>> ledgerID: 772
>> BookieMetadataFormatVersion 2
>> quorumSize: 3
>> ensembleSize: 3
>> length: 0
>> lastEntryId: -1
>> state: IN_RECOVERY
>> segment {
>>   ensembleMember: "10.64.103.57:3181"
>>   ensembleMember: "10.64.103.249:3181"
>>   ensembleMember: "10.64.102.95:3181"
>>   firstEntryId: 0
>> }
>> digestType: CRC32
>> password: ""
>> ackQuorumSize: 2
>>
>> Where all nodes except 10.64.103.249 are down.
>>
>> And that node contains these logs:
>> ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No
>> ledger found while reading entry:-1 from ledger: 772
>>
>>
>> They seem to be empty ledgers with no entries.
>>
>>
>>
>> I don't understand how these ledgers ended in this state, is it
>> recoverable?
>>
>>
>> If the ledgers are closed, if you lose two bookies, the re-replication
>> can replicate the data correctly. As when the ledger is in closed state, it
>> will contains the last entry id in the metadata, it would use the
>> information to determine the state of the ledger and replicate data
>> correctly.
>>
>> However, if the ledgers are open and you lost two bookies (which is the
>> majority of your quorum), the client can't make a decision what is the last
>> entry id based on only one left bookie, so it can not close/seal the ledger
>> correctly.
>>
>> Can you explain more about your tests? It would help me understand more
>> about that.
>>
>>
>>
>> I could just delete the ledgers cause they are empty too. By the way,
>> bookkeeper shell should have a command for deleting ledgers.
>>
>>
>> Yeah, this is a good suggestion. Do you mind creating a jira for adding
>> the delete ledger command?
>>
>>
>>
>> Thanks,
>> Sebastian
>>
>>
>>
>>

Mime
View raw message