Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C5FA6200BFA for ; Thu, 12 Jan 2017 18:12:29 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C4869160B40; Thu, 12 Jan 2017 17:12:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C3181160B29 for ; Thu, 12 Jan 2017 18:12:28 +0100 (CET) Received: (qmail 45217 invoked by uid 500); 12 Jan 2017 17:12:28 -0000 Mailing-List: contact user-help@bookkeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@bookkeeper.apache.org Delivered-To: mailing list user@bookkeeper.apache.org Received: (qmail 45207 invoked by uid 99); 12 Jan 2017 17:12:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Jan 2017 17:12:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7F3FBC023D for ; Thu, 12 Jan 2017 17:12:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.4 X-Spam-Level: ** X-Spam-Status: No, score=2.4 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mercadolibre.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id RVnosx40oBPC for ; Thu, 12 Jan 2017 17:12:24 +0000 (UTC) Received: from mail-io0-f172.google.com (mail-io0-f172.google.com [209.85.223.172]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id EAA575F299 for ; Thu, 12 Jan 2017 17:12:23 +0000 (UTC) Received: by mail-io0-f172.google.com with SMTP id j18so23435905ioe.2 for ; Thu, 12 Jan 2017 09:12:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mercadolibre.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=+qk/Y5A0+S+Sh2FaOOdr+f3uMHb4EKcRlc7mwn7vOB8=; b=SbkXQX83yVbJDfS9mSZjiVMq9HJy1dun19+QIhbhKU+VQd3UXSaq6DzD/TWodEZNNZ kFYBhr95YUWsaY4IXuAAxP4eKfJrv5pE6QYskUIyVTJ5yA0egqdyl7GcZUgYGUPR4nCS vjmzoQpHO1FDVLOao7zLyA9h1tt4XaIjkiVq8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=+qk/Y5A0+S+Sh2FaOOdr+f3uMHb4EKcRlc7mwn7vOB8=; b=HXmQxgfhcV34ZSFwAjoYaXra5PNIzw0okyGB8KBIgrDd89U7tBVG0BWusTSBFy4gvg ZlXMVqYriIQTlmFTacWOu8VNC78mOG+NvBKdgjKyXn6oC6cT41QbH26pjICCFajKzLX2 gzssFWReKFdwV1XLA60iBF9KQevDtJbTDpcA16xIfVVW7GeaHa4FxQVwkVpaPUY0QZ/t iSZecHJ2Y4S6jds/PckQt+Rs/0UQbU1M3HdNGJeDXPxW+qooOh68dJTQZef9WfwxPp9e Jlc41HFUSbKZbnbwWDoeQ5ABeUrG3YwSrN1Cj2zgOqcARmm1nO8lpqBDZLUaRvVjurC0 doVg== X-Gm-Message-State: AIkVDXKQEs3yyfMzkrITwPVLwjI2uGC3HVAeJBQwAWsXLfgpbF5d4oaXjYe0wDEFD3gXFIkz5xzYD3hioRZUVD6Y X-Received: by 10.107.7.78 with SMTP id 75mr6609782ioh.165.1484241142626; Thu, 12 Jan 2017 09:12:22 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Sebasti=C3=A1n_Schepens?= Date: Thu, 12 Jan 2017 17:12:11 +0000 Message-ID: Subject: Re: Ledgers failing to replicate To: user@bookkeeper.apache.org Content-Type: multipart/alternative; boundary=001a113f98d2115a9d0545e8d219 archived-at: Thu, 12 Jan 2017 17:12:30 -0000 --001a113f98d2115a9d0545e8d219 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Pulsar is creating the ledgers. This ledgers should get written when a topic receives messages I guess, but these must be idle topics. Ledgers should be closed no errors or once every period of time to allow rotation. The default grace period for open ledgers is 30s, theoretically, but shouldn't clients close ledgers when a node disconnects? Perhaps this is not happening because the ledgers aren't currently being written? Even so, ledgers in the grace period should be listed as underreplicated, shouldn't them? As I've said, before turning off another bookie I waited till all ledgers were replicated. Thanks for the tip on the quorums! Sebastian On Thu, Jan 12, 2017 at 1:31 PM Sijie Guo wrote: > I see. Let me ask one more questions - how do you create ledgers? And whe= n > do you write these ledgers and when do you close them. > > I think they are probably just empty ledgers at the time you were rolling= . > There is a setting in the recovery tool to force close the open ledgers. = I > need to check and confirm that. > > > > On Jan 12, 2017 6:14 AM, "Sebasti=C3=A1n Schepens" < > sebastian.schepens@mercadolibre.com> wrote: > > Sijie, > We were replacing all our nodes and testing how to do it best without > affecting the cluster. > > This same thing happened again yesterday. I have 4 underreplicated > ledgers, which are empty. > But this time, I turned off bookies on by one, and waiting for all > underreplicated ledgers to replicate before turning off another bookie. > Even while doing this 'rolling' replace, I ended up with inconsistent > ledgers. How can this be possible? > One would expect that when there are no underreplicated ledgers, it would > be safe to loose a machine. > > What's the recommended quorum setup if I wanted to safely tolerate 2 > machine failure? > > > If you want to tolerate 2 failures, you need to write quorum size - ack > quorum size to be larger than or equal to 2. > > > Thanks, > Sebastian > > > On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo wrote: > > On Wed, Jan 11, 2017 at 11:15 AM, Sebasti=C3=A1n Schepens < > sebastian.schepens@mercadolibre.com> wrote: > > Hi guys, > I'm doing some tests and turned off 2 bookies almost simultaneously hopin= g > that all the ledgers would still be able to replicate since we have > ensemble and quorum size of 3. > Almost all ledgers managed to replicate using the autorecovery daemon > except for 5. What's curious about this 5 ledgers is that they are all > empty and the only node which contains data for it claims it does not exi= st. > > Here's the ledger metadata for one of them: > ledgerID: 772 > BookieMetadataFormatVersion 2 > quorumSize: 3 > ensembleSize: 3 > length: 0 > lastEntryId: -1 > state: IN_RECOVERY > segment { > ensembleMember: "10.64.103.57:3181" > ensembleMember: "10.64.103.249:3181" > ensembleMember: "10.64.102.95:3181" > firstEntryId: 0 > } > digestType: CRC32 > password: "" > ackQuorumSize: 2 > > Where all nodes except 10.64.103.249 are down. > > And that node contains these logs: > ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger > found while reading entry:-1 from ledger: 772 > > > They seem to be empty ledgers with no entries. > > > > I don't understand how these ledgers ended in this state, is it > recoverable? > > > If the ledgers are closed, if you lose two bookies, the re-replication ca= n > replicate the data correctly. As when the ledger is in closed state, it > will contains the last entry id in the metadata, it would use the > information to determine the state of the ledger and replicate data > correctly. > > However, if the ledgers are open and you lost two bookies (which is the > majority of your quorum), the client can't make a decision what is the la= st > entry id based on only one left bookie, so it can not close/seal the ledg= er > correctly. > > Can you explain more about your tests? It would help me understand more > about that. > > > > I could just delete the ledgers cause they are empty too. By the way, > bookkeeper shell should have a command for deleting ledgers. > > > Yeah, this is a good suggestion. Do you mind creating a jira for adding > the delete ledger command? > > > > Thanks, > Sebastian > > > > --001a113f98d2115a9d0545e8d219 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Pulsar is creating the ledgers. This ledgers should get wr= itten when a topic receives messages I guess, but these must be idle topics= . Ledgers should be closed no errors or once every period of time to allow = rotation.

The default grace period for open ledgers is 3= 0s, theoretically, but shouldn't clients close ledgers when a node disc= onnects? Perhaps this is not happening because the ledgers aren't curre= ntly being written?
Even so, ledgers in the grace period should b= e listed as underreplicated, shouldn't them? As I've said, before t= urning off another bookie I waited till all ledgers were replicated.
<= div>
Thanks for the tip on the quorums!

<= div>Sebastian
On= Thu, Jan 12, 2017 at 1:31 PM Sijie Guo <sijie@apache.org> wrote:
I see. Let= me ask one more questions - how do you create ledgers? And when do you wri= te these ledgers and when do you close them.

I think they are probably just empty ledgers at the time you were r= olling. There is a setting in the recovery tool to force close the open led= gers. I need to check and confirm that.


Thanks,
Sebastian


On Wed, Jan 11, 2017 at 11:15= AM, Sebasti=C3=A1n Schepens <sebastian.schepens@mercadolibr= e.com> wrote:
Hi guys,I= 'm doing some tests and turned off 2 bookies almost simultaneously hopi= ng that all the ledgers would still be able to replicate since we have ense= mble and quorum size of 3.
Almost all ledgers managed to replicate = using the autorecovery daemon except for 5. What's curious about this 5= ledgers is that they are all empty and the only node which contains data f= or it claims it does not exist.

Here's the ledger me= tadata for one of them:
ledgerID: 772
BookieMetadataForm= atVersion 2
quorumSize: 3
ensembleSize: 3
length: 0
<= div class=3D"m_-7341893119336108786m_6516174851956927362gmail_msg gmail_msg= ">lastEntryId: -1
state: IN_RECOVERY
segment {
=C2= =A0 ensembleMember: "10.64.103.57:3181"
=C2=A0 ensembleMember: "10.64.103.249:3181"= ;
=C2=A0 ensembleMember: "10.64.102.95:3181"
=C2=A0 firstEntryId: 0=
}
digestType: CRC32
password: ""
ackQuorumSize: 2

Where all nodes except 10.64.103= .249 are down.

And that node contains these logs:
<= div class=3D"m_-7341893119336108786m_6516174851956927362gmail_msg gmail_msg= ">ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger= found while reading entry:-1 from ledger: 772

= They seem to be empty ledgers with no entries.=C2=A0
=C2=A0

I don't understand how these= ledgers ended in this state, is it recoverable?

= If the ledgers are closed, if you lose two bookies, the re-replication can = replicate the data correctly. As when the ledger is in closed state, it wil= l contains the last entry id in the metadata, it would use the information = to determine the state of the ledger and replicate data correctly.
=
However, if the ledgers are open and you lost two bookies (which= is the majority of your quorum), the client can't make a decision what= is the last entry id based on only one left bookie, so it can not close/se= al the ledger correctly.

Can you explain more about your= tests? It would help me understand more about that.
=C2=A0

I could just delete the ledgers cause the= y are empty too. By the way, bookkeeper shell should have a command for del= eting ledgers.

Yeah, this is a good suggestion. D= o you mind creating a jira for adding the delete ledger command?
= =C2=A0

Thanks,
Sebastian


<= /div> --001a113f98d2115a9d0545e8d219--