Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@kudu.apache.org
MIME-Version: 1.0
In-Reply-To: <CAO3rhrTBop2ZAL07KWrVwEwwVw65=LjzGg8anzVcBkTT9d=u-g@mail.gmail.com>
References: <CAO3rhrR05qcxExdT_8gAipX0D170Wf8g137aC-f81W3Kn5s2Jg@mail.gmail.com>
 <CALo2W-UGP=eSXD7DHKGmfZwJmpXUccXcPfYXBraOVMTxLb99og@mail.gmail.com>
 <CAO3rhrQN9sUEjXRqotUAxPUsFbKNo-poU-YJkS20M3VC+4n1jQ@mail.gmail.com>
 <CADY20s5M=E-RFrV82SOUHZn0U2+CzFHPy2pjPK=F7WcHBR0hbQ@mail.gmail.com> <CAO3rhrTBop2ZAL07KWrVwEwwVw65=LjzGg8anzVcBkTT9d=u-g@mail.gmail.com>
From: Dan Burkert <danburkert@apache.org>
Date: Sat, 20 May 2017 08:02:12 -0700
Message-ID: <CALo2W-Wb_XYXOUiJzLTz-8vUr0cLWcuY1Yf_-tBng2D8k=Ws0g@mail.gmail.com>
Subject: Re: Question about redistributing tablets on failure of a tserver.
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary="94eb2c1cd484a154f3054ff5ee33"
archived-at: Sat, 20 May 2017 15:02:56 -0000

--94eb2c1cd484a154f3054ff5ee33
Content-Type: text/plain; charset="UTF-8"

Hey Jason,

What effect did you see with that patch applied?  I've had mixed results
with it in my failover tests - it hasn't resolved some of the issues that I
expected it would, so I'm still looking in to it.  Any feedback you have on
it would be appreciated.

- Dan

On Fri, May 19, 2017 at 10:07 PM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> Thanks, @dan @Todd
>
> This issue has been resolved via https://gerrit.cloudera.org/#/c/6925/
>
> Regards,
>
> Jason
>
> 2017-05-09 4:55 GMT+09:00 Todd Lipcon <todd@cloudera.com>:
>
>> Hey Jason
>>
>> Sorry for the delayed response here. It looks from your ksck like copying
>> is ongoing but hasn't yet finished.
>>
>> FWIW Will B is working on adding more informative output to ksck to help
>> diagnose cases like this:
>> https://gerrit.cloudera.org/#/c/6772/
>>
>> -Todd
>>
>> On Thu, Apr 13, 2017 at 11:35 PM, Jason Heo <jason.heo.sde@gmail.com>
>> wrote:
>>
>>> @Dan
>>>
>>> I monitored with `kudu ksck` while re-replication is occurring, but I'm
>>> not sure if this output means my cluster has a problem. (It seems just
>>> indicating one tserver stopped)
>>>
>>> Would you please check it?
>>>
>>> Thank,
>>>
>>> Jason
>>>
>>> ```
>>> ...
>>> ...
>>> Tablet 0e29XXXXXXXXXXXXXXX1e1e3168a4d81 of table 'impala::tbl1' is
>>> under-replicated: 1 replica(s) not RUNNING
>>>   a7ca07f9bXXXXXXXXXXXXXXXbbb21cfb (hostname.com:7050): RUNNING
>>>   a97644XXXXXXXXXXXXXXXdb074d4380f (hostname.com:7050): RUNNING [LEADER]
>>>   401b6XXXXXXXXXXXXXXX5feda1de212b (hostname.com:7050): missing
>>>
>>> Tablet 550XXXXXXXXXXXXXXX08f5fc94126927 of table 'impala::tbl1' is
>>> under-replicated: 1 replica(s) not RUNNING
>>>   aec55b4XXXXXXXXXXXXXXXdb469427cf (hostname.com:7050): RUNNING [LEADER]
>>>   a7ca07f9b3d94XXXXXXXXXXXXXXX1cfb (hostname.com:7050): RUNNING
>>>   31461XXXXXXXXXXXXXXX3dbe060807a6 (hostname.com:7050): bad state
>>>     State:       NOT_STARTED
>>>     Data state:  TABLET_DATA_READY
>>>     Last status: Tablet initializing...
>>>
>>> Tablet 4a1490fcXXXXXXXXXXXXXXX7a2c637e3 of table 'impala::tbl1' is
>>> under-replicated: 1 replica(s) not RUNNING
>>>   a7ca07f9b3d94414XXXXXXXXXXXXXXXb (hostname.com:7050): RUNNING
>>>   40XXXXXXXXXXXXXXXd5b5feda1de212b (hostname.com:7050): RUNNING [LEADER]
>>>   aec55b4e2acXXXXXXXXXXXXXXX9427cf (hostname.com:7050): bad state
>>>     State:       NOT_STARTED
>>>     Data state:  TABLET_DATA_COPYING
>>>     Last status: TabletCopy: Downloading block 0000000005162382 (277/581)
>>> ...
>>> ...
>>> ==================
>>> Errors:
>>> ==================
>>> table consistency check error: Corruption: 52 table(s) are bad
>>>
>>> FAILED
>>> Runtime error: ksck discovered errors
>>> ```
>>>
>>>
>>>
>>> 2017-04-13 3:47 GMT+09:00 Dan Burkert <danburkert@apache.org>:
>>>
>>>> Hi Jason, answers inline:
>>>>
>>>> On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo <jason.heo.sde@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Q1. Can I disable redistributing tablets on failure of a tserver? The
>>>>> reason why I need this is described in Background.
>>>>>
>>>>
>>>> We don't have any kind of built-in maintenance mode that would prevent
>>>> this, but it can be achieved by setting a flag on each of the tablet
>>>> servers.  The goal is not to disable re-replicating tablets, but instead to
>>>> avoid kicking the failed replica out of the tablet groups to begin with.
>>>> There is a config flag to control exactly that: 'evict_failed_followers'.
>>>> This isn't considered a stable or supported flag, but it should have the
>>>> effect you are looking for, if you set it to false on each of the tablet
>>>> servers, by running:
>>>>
>>>>     kudu tserver set-flag <tserver-addr> evict_failed_followers false
>>>> --force
>>>>
>>>> for each tablet server.  When you are done, set it back to the default
>>>> 'true' value.  This isn't something we routinely test (especially setting
>>>> it without restarting the server), so please test before trying this on a
>>>> production cluster.
>>>>
>>>> Q2. redistribution goes on even if the failed tserver reconnected to
>>>>> cluster. In my test cluster, it took 2 hours to distribute when a tserver
>>>>> which has 3TB data was killed.
>>>>>
>>>>
>>>> This seems slow.  What's the speed of your network?  How many nodes?
>>>> How many tablet replicas were on the failed tserver, and were the replica
>>>> sizes evenly balanced?  Next time this happens, you might try monitoring
>>>> with 'kudu ksck' to ensure there aren't additional problems in the cluster (admin guide
>>>> on the ksck tool
>>>> <https://github.com/apache/kudu/blob/master/docs/administration.adoc#ksck>
>>>> ).
>>>>
>>>>
>>>>> Q3. `--follower_unavailable_considered_failed_sec` can be changed
>>>>> without restarting cluster?
>>>>>
>>>>
>>>> The flag can be changed, but it comes with the same caveats as above:
>>>>
>>>>     'kudu tserver set-flag <tserver-addr> follower_unavailable_considered_failed_sec
>>>> 900 --force'
>>>>
>>>>
>>>> - Dan
>>>>
>>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

--94eb2c1cd484a154f3054ff5ee33
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hey Jason,<div><br></div><div>What effect did you see with=
 that patch applied?=C2=A0 I&#39;ve had mixed results with it in my failove=
r tests - it hasn&#39;t resolved some of the issues that I expected it woul=
d, so I&#39;m still looking in to it.=C2=A0 Any feedback you have on it wou=
ld be appreciated.</div><div><br></div><div>- Dan</div></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">On Fri, May 19, 2017 at 10:07 P=
M, Jason Heo <span dir=3D"ltr">&lt;<a href=3D"mailto:jason.heo.sde@gmail.co=
m" target=3D"_blank">jason.heo.sde@gmail.com</a>&gt;</span> wrote:<br><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr">Thanks, @dan @Todd<div><br></div=
><div>This issue has been resolved via=C2=A0<a href=3D"https://gerrit.cloud=
era.org/#/c/6925/" target=3D"_blank">https://gerrit.cloudera.<wbr>org/#/c/6=
925/</a></div><div><br></div><div>Regards,</div><div><br></div><div>Jason</=
div></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra=
"><br><div class=3D"gmail_quote">2017-05-09 4:55 GMT+09:00 Todd Lipcon <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:todd@cloudera.com" target=3D"_blank">to=
dd@cloudera.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr">Hey Jason<div><br></div><div>Sorry for the delayed response here. =
It looks from your ksck like copying is ongoing but hasn&#39;t yet finished=
.</div><div><br></div><div>FWIW Will B is working on adding more informativ=
e output to ksck to help diagnose cases like this:</div><div><a href=3D"htt=
ps://gerrit.cloudera.org/#/c/6772/" target=3D"_blank">https://gerrit.cloude=
ra.org/#/<wbr>c/6772/</a><span class=3D"m_7199769233270110158HOEnZb"><font =
color=3D"#888888"><br></font></span></div><span class=3D"m_7199769233270110=
158HOEnZb"><font color=3D"#888888"><div><br></div><div>-Todd</div></font></=
span></div><div class=3D"gmail_extra"><div><div class=3D"m_7199769233270110=
158h5"><br><div class=3D"gmail_quote">On Thu, Apr 13, 2017 at 11:35 PM, Jas=
on Heo <span dir=3D"ltr">&lt;<a href=3D"mailto:jason.heo.sde@gmail.com" tar=
get=3D"_blank">jason.heo.sde@gmail.com</a>&gt;</span> wrote:<br><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex"><div dir=3D"ltr">@Dan<div><br></div><div>I monitored wi=
th `kudu ksck` while re-replication is occurring, but I&#39;m not sure if t=
his output means my cluster has a problem. (It seems just indicating one ts=
erver stopped)</div><div><br></div><div>Would you please check it?</div><di=
v><br></div><div>Thank,</div><div><br></div><div>Jason</div><div><br></div>=
<div>```</div><div><font face=3D"monospace, monospace">...</font></div><div=
><div><font face=3D"monospace, monospace">...</font></div><div><font face=
=3D"monospace, monospace">Tablet 0e29XXXXXXXXXXXXXXX1e1e3168a4d<wbr>81 of t=
able &#39;impala::tbl1&#39; is under-replicated: 1 replica(s) not RUNNING</=
font></div><div><font face=3D"monospace, monospace">=C2=A0 a7ca07f9bXXXXXXX=
XXXXXXXXbbb21c<wbr>fb (<a href=3D"http://hostname.com:7050" target=3D"_blan=
k">hostname.com:7050</a>): RUNNING</font></div><div><font face=3D"monospace=
, monospace">=C2=A0 a97644XXXXXXXXXXXXXXXdb074d438<wbr>0f (<a href=3D"http:=
//hostname.com:7050" target=3D"_blank">hostname.com:7050</a>): RUNNING [LEA=
DER]</font></div><div><font face=3D"monospace, monospace">=C2=A0 401b6XXXXX=
XXXXXXXXXX5feda1de21<wbr>2b (<a href=3D"http://hostname.com:7050" target=3D=
"_blank">hostname.com:7050</a>): missing</font></div><div><font face=3D"mon=
ospace, monospace"><br></font></div><div><font face=3D"monospace, monospace=
">Tablet 550XXXXXXXXXXXXXXX08f5fc941269<wbr>27 of table &#39;impala::tbl1&#=
39; is under-replicated: 1 replica(s) not RUNNING</font></div><div><font fa=
ce=3D"monospace, monospace">=C2=A0 aec55b4XXXXXXXXXXXXXXXdb469427<wbr>cf (<=
a href=3D"http://hostname.com:7050" target=3D"_blank">hostname.com:7050</a>=
): RUNNING [LEADER]</font></div><div><font face=3D"monospace, monospace">=
=C2=A0 a7ca07f9b3d94XXXXXXXXXXXXXXX1c<wbr>fb (<a href=3D"http://hostname.co=
m:7050" target=3D"_blank">hostname.com:7050</a>): RUNNING</font></div><div>=
<font face=3D"monospace, monospace">=C2=A0 31461XXXXXXXXXXXXXXX3dbe060807<w=
br>a6 (<a href=3D"http://hostname.com:7050" target=3D"_blank">hostname.com:=
7050</a>): bad state</font></div><div><font face=3D"monospace, monospace">=
=C2=A0 =C2=A0 State: =C2=A0 =C2=A0 =C2=A0 NOT_STARTED</font></div><div><fon=
t face=3D"monospace, monospace">=C2=A0 =C2=A0 Data state: =C2=A0TABLET_DATA=
_READY</font></div><div><font face=3D"monospace, monospace">=C2=A0 =C2=A0 L=
ast status: Tablet initializing...</font></div><div><font face=3D"monospace=
, monospace"><br></font></div><div><font face=3D"monospace, monospace">Tabl=
et 4a1490fcXXXXXXXXXXXXXXX7a2c637<wbr>e3 of table &#39;impala::tbl1&#39; is=
 under-replicated: 1 replica(s) not RUNNING</font></div><div><font face=3D"=
monospace, monospace">=C2=A0 a7ca07f9b3d94414XXXXXXXXXXXXXX<wbr>Xb (<a href=
=3D"http://hostname.com:7050" target=3D"_blank">hostname.com:7050</a>): RUN=
NING</font></div><div><font face=3D"monospace, monospace">=C2=A0 40XXXXXXXX=
XXXXXXXd5b5feda1de21<wbr>2b (<a href=3D"http://hostname.com:7050" target=3D=
"_blank">hostname.com:7050</a>): RUNNING [LEADER]</font></div><div><font fa=
ce=3D"monospace, monospace">=C2=A0 aec55b4e2acXXXXXXXXXXXXXXX9427<wbr>cf (<=
a href=3D"http://hostname.com:7050" target=3D"_blank">hostname.com:7050</a>=
): bad state</font></div><div><font face=3D"monospace, monospace">=C2=A0 =
=C2=A0 State: =C2=A0 =C2=A0 =C2=A0 NOT_STARTED</font></div><div><font face=
=3D"monospace, monospace">=C2=A0 =C2=A0 Data state: =C2=A0TABLET_DATA_COPYI=
NG</font></div><div><font face=3D"monospace, monospace">=C2=A0 =C2=A0 Last =
status: TabletCopy: Downloading block 0000000005162382 (277/581)</font></di=
v><div><font face=3D"monospace, monospace">...</font></div><div><font face=
=3D"monospace, monospace">...</font></div><div><font face=3D"monospace, mon=
ospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div>=
<div><font face=3D"monospace, monospace">Errors:</font></div><div><font fac=
e=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D</font></div><div><font face=3D"monospace, monospace">table consisten=
cy check error: Corruption: 52 table(s) are bad</font></div><div><font face=
=3D"monospace, monospace"><br></font></div><div><font face=3D"monospace, mo=
nospace">FAILED</font></div><div><font face=3D"monospace, monospace">Runtim=
e error: ksck discovered errors</font></div></div><div>```</div><div><br></=
div><div><br></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote=
"><span>2017-04-13 3:47 GMT+09:00 Dan Burkert <span dir=3D"ltr">&lt;<a href=
=3D"mailto:danburkert@apache.org" target=3D"_blank">danburkert@apache.org</=
a>&gt;</span>:<br></span><div><div class=3D"m_7199769233270110158m_-4719866=
162031459314h5"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0=
px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"ltr">Hi Jason, answers inline:<br><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote"><span class=3D"m_7199769233270110158m_-471986616203145=
9314m_-5193628100514331949gmail-">On Wed, Apr 12, 2017 at 5:53 AM, Jason He=
o <span dir=3D"ltr">&lt;<a href=3D"mailto:jason.heo.sde@gmail.com" target=
=3D"_blank">jason.heo.sde@gmail.com</a>&gt;</span> wrote:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><br></div><div>Q1=
. Can I disable=C2=A0redistributing tablets on failure of a tserver? The re=
ason why I need this is described in Background.</div></div></blockquote><d=
iv><br></div></span><div>We don&#39;t have any kind of built-in maintenance=
 mode that would prevent this, but it can be achieved by setting a flag on =
each of the tablet servers.=C2=A0 The goal is not to disable re-replicating=
 tablets, but instead to avoid kicking the failed replica out of the tablet=
 groups to begin with.=C2=A0 There is a config flag to control exactly that=
: &#39;evict_failed_followers&#39;.=C2=A0 This isn&#39;t considered a stabl=
e or supported flag, but it should have the effect you are looking for, if =
you set it to false on each of the tablet servers, by running:</div><div><b=
r></div><div>=C2=A0 =C2=A0 kudu tserver set-flag &lt;tserver-addr&gt; evict=
_failed_followers false --force</div><div><br></div><div>for each tablet se=
rver.=C2=A0 When you are done, set it back to the default &#39;true&#39; va=
lue.=C2=A0 This isn&#39;t something we routinely test (especially setting i=
t without restarting the server), so please test before trying this on a pr=
oduction cluster.</div><span class=3D"m_7199769233270110158m_-4719866162031=
459314m_-5193628100514331949gmail-"><div><br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2=
04,204);padding-left:1ex"><div dir=3D"ltr"><div>Q2. redistribution goes on =
even if the failed tserver reconnected to cluster. In my test cluster, it t=
ook 2 hours to distribute when a tserver which has 3TB data was killed.</di=
v></div></blockquote><div><br></div></span><div>This seems slow.=C2=A0 What=
&#39;s the speed of your network?=C2=A0 How many nodes?=C2=A0 How many tabl=
et replicas were on the failed tserver, and were the replica sizes evenly b=
alanced?=C2=A0 Next time this happens, you might try monitoring with &#39;k=
udu ksck&#39; to ensure there aren&#39;t additional problems in the cluster=
 (<a href=3D"https://github.com/apache/kudu/blob/master/docs/administration=
.adoc#ksck" target=3D"_blank">admin=C2=A0guide on the ksck tool</a>).</div>=
<span class=3D"m_7199769233270110158m_-4719866162031459314m_-51936281005143=
31949gmail-"><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex"><div dir=3D"ltr"><div>Q3. `--follower_unavailable_consid<wbr>ered_fail=
ed_sec` can be changed without restarting cluster?</div></div></blockquote>=
<div><br></div></span><div>The flag can be changed, but it comes with the s=
ame caveats as above:</div><div><br></div><div>=C2=A0 =C2=A0 &#39;kudu tser=
ver set-flag &lt;tserver-addr&gt; follower_unavailable_considere<wbr>d_fail=
ed_sec 900 --force&#39;</div><span class=3D"m_7199769233270110158m_-4719866=
162031459314m_-5193628100514331949gmail-HOEnZb"><font color=3D"#888888"><di=
v><br></div><div><br></div><div>- Dan</div></font></span></div><br></div></=
div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span>-=
- <br><div class=3D"m_7199769233270110158m_-4719866162031459314gmail_signat=
ure" data-smartmail=3D"gmail_signature">Todd Lipcon<br>Software Engineer, C=
loudera</div>
</span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--94eb2c1cd484a154f3054ff5ee33--