bookkeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastián Schepens <sebastian.schep...@mercadolibre.com>
Subject Re: Ledgers failing to replicate
Date Thu, 12 Jan 2017 18:33:35 GMT
On Thu, Jan 12, 2017 at 3:20 PM Sijie Guo <sijie@apache.org> wrote:

> On Thu, Jan 12, 2017 at 9:12 AM, Sebastián Schepens <
> sebastian.schepens@mercadolibre.com> wrote:
>
> Pulsar is creating the ledgers. This ledgers should get written when a
> topic receives messages I guess, but these must be idle topics. Ledgers
> should be closed no errors or once every period of time to allow rotation.
>
> The default grace period for open ledgers is 30s, theoretically, but
> shouldn't clients close ledgers when a node disconnects?
>
>
> Ideally I think pulsar broker should close ledgers periodically on
> rotations. I guess probably there are idle topics, so the ledgers used for
> those topics are empty and not closed by pulsar broker.
>

I'm gonna see if I can find this logic.


>
>
> Perhaps this is not happening because the ledgers aren't currently being
> written?
> Even so, ledgers in the grace period should be listed as underreplicated,
> shouldn't them? As I've said, before turning off another bookie I waited
> till all ledgers were replicated.
>
>
> So there is a logic in auto recovery:
>
> when it detects a ledger is missing bookies, it will mark it as
> under-replicated and the replication worker (which is the auto recovery
> daemon) will start replicate those under-replicated ledgers. If there are
> open ledgers, it doesn't replicate the ledger immediately. It defers the
> action in openLedgerRereplicationGracePeriod period (which is the 30
> seconds). After openLedgerRereplicationGracePeriod period, it forces
> fencing the ledger and releases the lock for replicating this ledger. so
> that this ledger can be replicated later by any replication worker.
>
> In theory, if you waited until all ledgers were replicated (means no
> ledgers are marked as under-replicated), those ledgers should already
> successfully be re-replicated.
>

This is precisely what I thought.


> There is one possibility that I can think of - there are ledgers created
> after the auditor of auto recovery audits all the existing ledgers. What is
> your auditorPeriodicBookieCheckInterval ?
>

But, wouldn't ledgers created after the audit exclude the failed bookie? I
mean, the audit started because a node went down, new ledgers should
exclude that node.
We have auditorPeriodicBookieCheckInterval at the default which 86400
seconds. I understand that running this check very often could bring issues
as it stresses zookeeper a lot.

Another question about quorums, say I have 3 write quorum and 3 ack quorum,
that would theoretically be able to handle a loss of 2 nodes as well,
wouldn't it?

Thanks,
Sebastian


> - Sijie
>
>
>
> Thanks for the tip on the quorums!
>
> Sebastian
>
> On Thu, Jan 12, 2017 at 1:31 PM Sijie Guo <sijie@apache.org> wrote:
>
> I see. Let me ask one more questions - how do you create ledgers? And when
> do you write these ledgers and when do you close them.
>
> I think they are probably just empty ledgers at the time you were rolling.
> There is a setting in the recovery tool to force close the open ledgers. I
> need to check and confirm that.
>
>
>
> On Jan 12, 2017 6:14 AM, "Sebastián Schepens" <
> sebastian.schepens@mercadolibre.com> wrote:
>
> Sijie,
> We were replacing all our nodes and testing how to do it best without
> affecting the cluster.
>
> This same thing happened again yesterday. I have 4 underreplicated
> ledgers, which are empty.
> But this time, I turned off bookies on by one, and waiting for all
> underreplicated ledgers to replicate before turning off another bookie.
> Even while doing this 'rolling' replace, I ended up with inconsistent
> ledgers. How can this be possible?
> One would expect that when there are no underreplicated ledgers, it would
> be safe to loose a machine.
>
> What's the recommended quorum setup if I wanted to safely tolerate 2
> machine failure?
>
>
> If you want to tolerate 2 failures, you need to write quorum size - ack
> quorum size to be larger than or equal to 2.
>
>
> Thanks,
> Sebastian
>
>
> On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <sijie@apache.org> wrote:
>
> On Wed, Jan 11, 2017 at 11:15 AM, Sebastián Schepens <
> sebastian.schepens@mercadolibre.com> wrote:
>
> Hi guys,
> I'm doing some tests and turned off 2 bookies almost simultaneously hoping
> that all the ledgers would still be able to replicate since we have
> ensemble and quorum size of 3.
> Almost all ledgers managed to replicate using the autorecovery daemon
> except for 5. What's curious about this 5 ledgers is that they are all
> empty and the only node which contains data for it claims it does not exist.
>
> Here's the ledger metadata for one of them:
> ledgerID: 772
> BookieMetadataFormatVersion 2
> quorumSize: 3
> ensembleSize: 3
> length: 0
> lastEntryId: -1
> state: IN_RECOVERY
> segment {
>   ensembleMember: "10.64.103.57:3181"
>   ensembleMember: "10.64.103.249:3181"
>   ensembleMember: "10.64.102.95:3181"
>   firstEntryId: 0
> }
> digestType: CRC32
> password: ""
> ackQuorumSize: 2
>
> Where all nodes except 10.64.103.249 are down.
>
> And that node contains these logs:
> ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger
> found while reading entry:-1 from ledger: 772
>
>
> They seem to be empty ledgers with no entries.
>
>
>
> I don't understand how these ledgers ended in this state, is it
> recoverable?
>
>
> If the ledgers are closed, if you lose two bookies, the re-replication can
> replicate the data correctly. As when the ledger is in closed state, it
> will contains the last entry id in the metadata, it would use the
> information to determine the state of the ledger and replicate data
> correctly.
>
> However, if the ledgers are open and you lost two bookies (which is the
> majority of your quorum), the client can't make a decision what is the last
> entry id based on only one left bookie, so it can not close/seal the ledger
> correctly.
>
> Can you explain more about your tests? It would help me understand more
> about that.
>
>
>
> I could just delete the ledgers cause they are empty too. By the way,
> bookkeeper shell should have a command for deleting ledgers.
>
>
> Yeah, this is a good suggestion. Do you mind creating a jira for adding
> the delete ledger command?
>
>
>
> Thanks,
> Sebastian
>
>
>
>

Mime
View raw message