Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C4160200BFA for ; Thu, 12 Jan 2017 15:14:15 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C2CFE160B40; Thu, 12 Jan 2017 14:14:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BACC9160B2D for ; Thu, 12 Jan 2017 15:14:14 +0100 (CET) Received: (qmail 20734 invoked by uid 500); 12 Jan 2017 14:14:13 -0000 Mailing-List: contact user-help@bookkeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@bookkeeper.apache.org Delivered-To: mailing list user@bookkeeper.apache.org Received: (qmail 20724 invoked by uid 99); 12 Jan 2017 14:14:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jan 2017 14:14:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7BF92C047D for ; Thu, 12 Jan 2017 14:14:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.681 X-Spam-Level: * X-Spam-Status: No, score=1.681 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mercadolibre.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id VzCBzXUlsvpe for ; Thu, 12 Jan 2017 14:14:11 +0000 (UTC) Received: from mail-it0-f45.google.com (mail-it0-f45.google.com [209.85.214.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0B39A5F3F5 for ; Thu, 12 Jan 2017 14:14:11 +0000 (UTC) Received: by mail-it0-f45.google.com with SMTP id 203so12000395ith.0 for ; Thu, 12 Jan 2017 06:14:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mercadolibre.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=kyQ6yW69n23QAdDaZ+VtZQwStOXknc8WjoUybixRO54=; b=Ycg8Sv5uGeX54KqnjVO/0/hMtBlOJg7D4QXRnLfR2I0mW+z1cFUNKBiSsKN74xOcvl YSVlQCj7TKNBH5jic+oL1eHLCqOgaO/rVs+966iLt8wkOlF/HcL1elarLAuvpaix3bLX WAwpTWe41W05TVgsqJMeVaY2HR+YiDB+omK9k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=kyQ6yW69n23QAdDaZ+VtZQwStOXknc8WjoUybixRO54=; b=MzDPNFqMA69bWMqcxjEvH5N0hAfUPza+OWTIz5aLx5LHRxzdt+eOWCy0hjYKW0CtNO jbgW9U4n1upL8Nh2B+fpSxE0AM45PSSzC4an6UpQBcpLtTphfBGfhOQL8B3ae8V588FP n8h8A6V5rYXw/4cyG+56HDZfZ/7HJJbePFGzBMWfxAYBSBJc6N9Oiv78eSDGCZRIyhFk Dy0fFHsTmf1Keq9EwbjiTwr88FUvIHBoaXXTqvsT2ECTqY+PcBvKoUw3ghN7B0T1Dg1u o8UB4p8jhStBRY2yAr2U0G8JQT28c0HoBqdGo0j84GPXLZ8rYhAtZ8QklYfS5S1Q+9z6 qjbQ== X-Gm-Message-State: AIkVDXJ5qKICYRSeb7nEfGLdB5udc7y/lG7ucCgat96nlbmuIR9lflPUzibMpkqAWYGlYJWOmhGj7R/oNOSVRVJl X-Received: by 10.36.225.195 with SMTP id n186mr10030109ith.35.1484230448030; Thu, 12 Jan 2017 06:14:08 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Sebasti=C3=A1n_Schepens?= Date: Thu, 12 Jan 2017 14:13:57 +0000 Message-ID: Subject: Re: Ledgers failing to replicate To: user@bookkeeper.apache.org Content-Type: multipart/alternative; boundary=94eb2c19d6769ebcf10545e6547f archived-at: Thu, 12 Jan 2017 14:14:16 -0000 --94eb2c19d6769ebcf10545e6547f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sijie, We were replacing all our nodes and testing how to do it best without affecting the cluster. This same thing happened again yesterday. I have 4 underreplicated ledgers, which are empty. But this time, I turned off bookies on by one, and waiting for all underreplicated ledgers to replicate before turning off another bookie. Even while doing this 'rolling' replace, I ended up with inconsistent ledgers. How can this be possible? One would expect that when there are no underreplicated ledgers, it would be safe to loose a machine. What's the recommended quorum setup if I wanted to safely tolerate 2 machine failure? Thanks, Sebastian On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo wrote: > On Wed, Jan 11, 2017 at 11:15 AM, Sebasti=C3=A1n Schepens < > sebastian.schepens@mercadolibre.com> wrote: > > Hi guys, > I'm doing some tests and turned off 2 bookies almost simultaneously hopin= g > that all the ledgers would still be able to replicate since we have > ensemble and quorum size of 3. > Almost all ledgers managed to replicate using the autorecovery daemon > except for 5. What's curious about this 5 ledgers is that they are all > empty and the only node which contains data for it claims it does not exi= st. > > Here's the ledger metadata for one of them: > ledgerID: 772 > BookieMetadataFormatVersion 2 > quorumSize: 3 > ensembleSize: 3 > length: 0 > lastEntryId: -1 > state: IN_RECOVERY > segment { > ensembleMember: "10.64.103.57:3181" > ensembleMember: "10.64.103.249:3181" > ensembleMember: "10.64.102.95:3181" > firstEntryId: 0 > } > digestType: CRC32 > password: "" > ackQuorumSize: 2 > > Where all nodes except 10.64.103.249 are down. > > And that node contains these logs: > ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger > found while reading entry:-1 from ledger: 772 > > > They seem to be empty ledgers with no entries. > > > > I don't understand how these ledgers ended in this state, is it > recoverable? > > > If the ledgers are closed, if you lose two bookies, the re-replication ca= n > replicate the data correctly. As when the ledger is in closed state, it > will contains the last entry id in the metadata, it would use the > information to determine the state of the ledger and replicate data > correctly. > > However, if the ledgers are open and you lost two bookies (which is the > majority of your quorum), the client can't make a decision what is the la= st > entry id based on only one left bookie, so it can not close/seal the ledg= er > correctly. > > Can you explain more about your tests? It would help me understand more > about that. > > > > I could just delete the ledgers cause they are empty too. By the way, > bookkeeper shell should have a command for deleting ledgers. > > > Yeah, this is a good suggestion. Do you mind creating a jira for adding > the delete ledger command? > > > > Thanks, > Sebastian > > > --94eb2c19d6769ebcf10545e6547f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Sijie,
We were replacing all our nodes and testing how= to do it best without affecting the cluster.

This= same thing happened again yesterday. I have 4 underreplicated ledgers, whi= ch are empty.
But this time, I turned off bookies on by one, and = waiting for all underreplicated ledgers to replicate before turning off ano= ther bookie.
Even while doing this 'rolling' replace, I e= nded up with inconsistent ledgers. How can this be possible?
One = would expect that when there are no underreplicated ledgers, it would be sa= fe to loose a machine.

What's the recommended = quorum setup if I wanted to safely tolerate 2 machine failure?
Thanks,
Sebastian

=
On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <sijie@apache.org> wrote:
On Wed, Jan 11, 20= 17 at 11:15 AM, Sebasti=C3=A1n Schepens <sebastian.schepens@mercadolibre.com>= wrote:
Hi guys,
I'm = doing some tests and turned off 2 bookies almost simultaneously hoping that= all the ledgers would still be able to replicate since we have ensemble an= d quorum size of 3.
Almost all ledgers manage= d to replicate using the autorecovery daemon except for 5. What's curio= us about this 5 ledgers is that they are all empty and the only node which = contains data for it claims it does not exist.

Here's the led= ger metadata for one of them:
ledgerID: 772
BookieMetadataFormatV= ersion 2
quorumSize: 3
ensembleSize: 3
length: 0
lastEntry= Id: -1
state: IN_RECOVERY
segment {
=C2=A0 ensembleMember: &= quot;10.64.103.57:3181"
=C2=A0 ensem= bleMember: "10.64.103.249:3181"
=C2=A0 ensembleMember: "10.64.102.95:3181"
=C2=A0 firstEntryId: 0
}
digestType: CRC32
p= assword: ""
ackQuorumSize: 2
<= /div>

Where all nodes except 10.64.103.249 are down.

And that= node contains these logs:
ERROR - [BookieRea= dThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger found while reading= entry:-1 from ledger: 772
=

They seem to be = empty ledgers with no entries.=C2=A0
=C2=A0

I don't understand how these ledgers ended in thi= s state, is it recoverable?

If the ledgers are closed, if you lose = two bookies, the re-replication can replicate the data correctly. As when t= he ledger is in closed state, it will contains the last entry id in the met= adata, it would use the information to determine the state of the ledger an= d replicate data correctly.

However, if the ledgers are open and = you lost two bookies (which is the majority of your quorum), the client can= 't make a decision what is the last entry id based on only one left boo= kie, so it can not close/seal the ledger correctly.

Can you expla= in more about your tests? It would help me understand more about that.
=C2=A0

=
I could just delete the ledgers cause they are emp= ty too. By the way, bookkeeper shell should have a command for deleting led= gers.

Yeah, this is a good suggestion. Do you mind creating a jira= for adding the delete ledger command?
=C2=A0=

Thanks,
Sebastian

--94eb2c19d6769ebcf10545e6547f--