Mailing-List: contact user-help@bookkeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@bookkeeper.apache.org
MIME-Version: 1.0
References: <CAMx78x6mXHaemJWmMV6foy7dZkGE1yLzFOTbVpzMkPKDH41ejA@mail.gmail.com>
 <CAO2yDyZk-kmr0bhBBv=+7=YSvDuPPP0awN5YjYZDZTV2rn9MNA@mail.gmail.com>
In-Reply-To: <CAO2yDyZk-kmr0bhBBv=+7=YSvDuPPP0awN5YjYZDZTV2rn9MNA@mail.gmail.com>
From: =?UTF-8?Q?Sebasti=C3=A1n_Schepens?= <sebastian.schepens@mercadolibre.com>
Date: Thu, 12 Jan 2017 14:13:57 +0000
Message-ID: <CAMx78x6DJPnpk_4BmCu2Xf+1PpPQzNQZgKtHWd8vBop5Az9UfA@mail.gmail.com>
Subject: Re: Ledgers failing to replicate
To: user@bookkeeper.apache.org
Content-Type: multipart/alternative; boundary=94eb2c19d6769ebcf10545e6547f
archived-at: Thu, 12 Jan 2017 14:14:16 -0000

--94eb2c19d6769ebcf10545e6547f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Sijie,
We were replacing all our nodes and testing how to do it best without
affecting the cluster.

This same thing happened again yesterday. I have 4 underreplicated ledgers,
which are empty.
But this time, I turned off bookies on by one, and waiting for all
underreplicated ledgers to replicate before turning off another bookie.
Even while doing this 'rolling' replace, I ended up with inconsistent
ledgers. How can this be possible?
One would expect that when there are no underreplicated ledgers, it would
be safe to loose a machine.

What's the recommended quorum setup if I wanted to safely tolerate 2
machine failure?

Thanks,
Sebastian

On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo <sijie@apache.org> wrote:

> On Wed, Jan 11, 2017 at 11:15 AM, Sebasti=C3=A1n Schepens <
> sebastian.schepens@mercadolibre.com> wrote:
>
> Hi guys,
> I'm doing some tests and turned off 2 bookies almost simultaneously hopin=
g
> that all the ledgers would still be able to replicate since we have
> ensemble and quorum size of 3.
> Almost all ledgers managed to replicate using the autorecovery daemon
> except for 5. What's curious about this 5 ledgers is that they are all
> empty and the only node which contains data for it claims it does not exi=
st.
>
> Here's the ledger metadata for one of them:
> ledgerID: 772
> BookieMetadataFormatVersion 2
> quorumSize: 3
> ensembleSize: 3
> length: 0
> lastEntryId: -1
> state: IN_RECOVERY
> segment {
>   ensembleMember: "10.64.103.57:3181"
>   ensembleMember: "10.64.103.249:3181"
>   ensembleMember: "10.64.102.95:3181"
>   firstEntryId: 0
> }
> digestType: CRC32
> password: ""
> ackQuorumSize: 2
>
> Where all nodes except 10.64.103.249 are down.
>
> And that node contains these logs:
> ERROR - [BookieReadThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger
> found while reading entry:-1 from ledger: 772
>
>
> They seem to be empty ledgers with no entries.
>
>
>
> I don't understand how these ledgers ended in this state, is it
> recoverable?
>
>
> If the ledgers are closed, if you lose two bookies, the re-replication ca=
n
> replicate the data correctly. As when the ledger is in closed state, it
> will contains the last entry id in the metadata, it would use the
> information to determine the state of the ledger and replicate data
> correctly.
>
> However, if the ledgers are open and you lost two bookies (which is the
> majority of your quorum), the client can't make a decision what is the la=
st
> entry id based on only one left bookie, so it can not close/seal the ledg=
er
> correctly.
>
> Can you explain more about your tests? It would help me understand more
> about that.
>
>
>
> I could just delete the ledgers cause they are empty too. By the way,
> bookkeeper shell should have a command for deleting ledgers.
>
>
> Yeah, this is a good suggestion. Do you mind creating a jira for adding
> the delete ledger command?
>
>
>
> Thanks,
> Sebastian
>
>
>

--94eb2c19d6769ebcf10545e6547f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Sijie,<div>We were replacing all our nodes and testing how=
 to do it best without affecting the cluster.</div><div><br></div><div>This=
 same thing happened again yesterday. I have 4 underreplicated ledgers, whi=
ch are empty.</div><div>But this time, I turned off bookies on by one, and =
waiting for all underreplicated ledgers to replicate before turning off ano=
ther bookie.</div><div>Even while doing this &#39;rolling&#39; replace, I e=
nded up with inconsistent ledgers. How can this be possible?</div><div>One =
would expect that when there are no underreplicated ledgers, it would be sa=
fe to loose a machine.</div><div><br></div><div>What&#39;s the recommended =
quorum setup if I wanted to safely tolerate 2 machine failure?</div><div><b=
r></div><div>Thanks,</div><div>Sebastian<br><br><div class=3D"gmail_quote">=
<div dir=3D"ltr">On Wed, Jan 11, 2017 at 5:04 PM Sijie Guo &lt;<a href=3D"m=
ailto:sijie@apache.org">sijie@apache.org</a>&gt; wrote:<br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gma=
il_extra gmail_msg"><div class=3D"gmail_quote gmail_msg">On Wed, Jan 11, 20=
17 at 11:15 AM, Sebasti=C3=A1n Schepens <span dir=3D"ltr" class=3D"gmail_ms=
g">&lt;<a href=3D"mailto:sebastian.schepens@mercadolibre.com" class=3D"gmai=
l_msg" target=3D"_blank">sebastian.schepens@mercadolibre.com</a>&gt;</span>=
 wrote:<br class=3D"gmail_msg"><blockquote class=3D"gmail_quote gmail_msg" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><di=
v dir=3D"ltr" class=3D"gmail_msg">Hi guys,<div class=3D"gmail_msg">I&#39;m =
doing some tests and turned off 2 bookies almost simultaneously hoping that=
 all the ledgers would still be able to replicate since we have ensemble an=
d quorum size of 3.</div><div class=3D"gmail_msg">Almost all ledgers manage=
d to replicate using the autorecovery daemon except for 5. What&#39;s curio=
us about this 5 ledgers is that they are all empty and the only node which =
contains data for it claims it does not exist.</div><div class=3D"gmail_msg=
"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">Here&#39;s the led=
ger metadata for one of them:</div><div class=3D"gmail_msg"><div class=3D"g=
mail_msg">ledgerID: 772</div><div class=3D"gmail_msg">BookieMetadataFormatV=
ersion<span class=3D"m_-8295631430452324051m_4629958036903188960Apple-tab-s=
pan gmail_msg" style=3D"white-space:pre-wrap">	</span>2</div><div class=3D"=
gmail_msg">quorumSize: 3</div><div class=3D"gmail_msg">ensembleSize: 3</div=
><div class=3D"gmail_msg">length: 0</div><div class=3D"gmail_msg">lastEntry=
Id: -1</div><div class=3D"gmail_msg">state: IN_RECOVERY</div><div class=3D"=
gmail_msg">segment {</div><div class=3D"gmail_msg">=C2=A0 ensembleMember: &=
quot;<a href=3D"http://10.64.103.57:3181" class=3D"gmail_msg" target=3D"_bl=
ank">10.64.103.57:3181</a>&quot;</div><div class=3D"gmail_msg">=C2=A0 ensem=
bleMember: &quot;<a href=3D"http://10.64.103.249:3181" class=3D"gmail_msg" =
target=3D"_blank">10.64.103.249:3181</a>&quot;</div><div class=3D"gmail_msg=
">=C2=A0 ensembleMember: &quot;<a href=3D"http://10.64.102.95:3181" class=
=3D"gmail_msg" target=3D"_blank">10.64.102.95:3181</a>&quot;</div><div clas=
s=3D"gmail_msg">=C2=A0 firstEntryId: 0</div><div class=3D"gmail_msg">}</div=
><div class=3D"gmail_msg">digestType: CRC32</div><div class=3D"gmail_msg">p=
assword: &quot;&quot;</div><div class=3D"gmail_msg">ackQuorumSize: 2</div><=
/div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"g=
mail_msg">Where all nodes except 10.64.103.249 are down.</div><div class=3D=
"gmail_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">And that=
 node contains these logs:</div><div class=3D"gmail_msg">ERROR - [BookieRea=
dThread-3181-10-1:ReadEntryProcessorV3@123] - No ledger found while reading=
 entry:-1 from ledger: 772<br class=3D"gmail_msg"></div></div></blockquote>=
<div class=3D"gmail_msg"><br class=3D"gmail_msg"></div></div></div></div><d=
iv dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><di=
v class=3D"gmail_quote gmail_msg"><div class=3D"gmail_msg">They seem to be =
empty ledgers with no entries.=C2=A0</div></div></div></div><div dir=3D"ltr=
" class=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><div class=3D"gm=
ail_quote gmail_msg"><div class=3D"gmail_msg">=C2=A0</div><blockquote class=
=3D"gmail_quote gmail_msg" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"=
gmail_msg"></div><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><di=
v class=3D"gmail_msg">I don&#39;t understand how these ledgers ended in thi=
s state, is it recoverable?</div></div></blockquote><div class=3D"gmail_msg=
"><br class=3D"gmail_msg"></div></div></div></div><div dir=3D"ltr" class=3D=
"gmail_msg"><div class=3D"gmail_extra gmail_msg"><div class=3D"gmail_quote =
gmail_msg"><div class=3D"gmail_msg">If the ledgers are closed, if you lose =
two bookies, the re-replication can replicate the data correctly. As when t=
he ledger is in closed state, it will contains the last entry id in the met=
adata, it would use the information to determine the state of the ledger an=
d replicate data correctly.</div><div class=3D"gmail_msg"><br class=3D"gmai=
l_msg"></div><div class=3D"gmail_msg">However, if the ledgers are open and =
you lost two bookies (which is the majority of your quorum), the client can=
&#39;t make a decision what is the last entry id based on only one left boo=
kie, so it can not close/seal the ledger correctly.</div><div class=3D"gmai=
l_msg"><br class=3D"gmail_msg"></div><div class=3D"gmail_msg">Can you expla=
in more about your tests? It would help me understand more about that.</div=
></div></div></div><div dir=3D"ltr" class=3D"gmail_msg"><div class=3D"gmail=
_extra gmail_msg"><div class=3D"gmail_quote gmail_msg"><div class=3D"gmail_=
msg">=C2=A0</div><blockquote class=3D"gmail_quote gmail_msg" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" =
class=3D"gmail_msg"><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div>=
<div class=3D"gmail_msg">I could just delete the ledgers cause they are emp=
ty too. By the way, bookkeeper shell should have a command for deleting led=
gers.</div></div></blockquote><div class=3D"gmail_msg"><br class=3D"gmail_m=
sg"></div></div></div></div><div dir=3D"ltr" class=3D"gmail_msg"><div class=
=3D"gmail_extra gmail_msg"><div class=3D"gmail_quote gmail_msg"><div class=
=3D"gmail_msg">Yeah, this is a good suggestion. Do you mind creating a jira=
 for adding the delete ledger command?</div><div class=3D"gmail_msg">=C2=A0=
</div><blockquote class=3D"gmail_quote gmail_msg" style=3D"margin:0 0 0 .8e=
x;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gm=
ail_msg"><div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg">Thanks,</div><div class=3D"gmail_msg">Sebastian</div></div>
</blockquote></div><br class=3D"gmail_msg"></div></div>
</blockquote></div></div></div>

--94eb2c19d6769ebcf10545e6547f--