kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: kudu-tserver died suddenly
Date Tue, 06 Jun 2017 16:17:34 GMT
On Tue, Jun 6, 2017 at 6:01 AM, Jason Heo <jason.heo.sde@gmail.com> wrote:

> How can I avoid this known bug?
>
> a. downgrade to Kudu 1.2 and upgrade after fixed
>

I think this bug has actually been present for a long time, but is more
likely in 1.4 because of the patch 40aa4c3c271c9df20a17a1d353ce582ee3fda742
which generally makes the maintenance manager higher throughput. You could
consider reverting that patch locally if this problem is frequent.

Alternatively, I think it would be safe to change the FATAL error message
there to a LOG(WARNING). In that case, it might perform a compaction which
is less effective, but would avoid the crash.


> b. decrease mm num threads (I also set to 8 currently)
>

That would also decrease the likelihood of a problem.


>
> I have Data which is loaded by Kudu 1.4, and I'm using CDH 5.11.0. I'm
> wondering it is safe to downgrade to Kudu 1.2 without reinstalling or
> dropping all Data.
>
>
Downgrade is not something we test regularly. It's possible that it would
work between these versions, but I would test in a dev cluster before doing
so in production.

-Todd


> Thanks.
>
> 2017-06-06 15:13 GMT+09:00 Jason Heo <jason.heo.sde@gmail.com>:
>
>> Hi Todd,
>>
>> Thank you for your reply.
>>
>> Ok, I got it. I should have googled it before mailing ;)
>>
>> Regards,
>>
>> Jason
>>
>> 2017-06-06 15:03 GMT+09:00 Todd Lipcon <todd@cloudera.com>:
>>
>>> Hi Jason,
>>>
>>> It sounds like you hit https://issues.apache.org/jira/browse/KUDU-1956
>>> -- it's a known bug that we haven't gotten around to fixing yet. I hadn't
>>> seen it "in the wild" before, but I'll add a note to the JIRA that you hit
>>> it, and try to prioritize a fix soon (eg for 1.4.1)
>>>
>>> -Todd
>>>
>>> On Mon, Jun 5, 2017 at 6:38 PM, Jason Heo <jason.heo.sde@gmail.com>
>>> wrote:
>>>
>>>> Hello.
>>>>
>>>> I'm using this patch https://gerrit.cloudera.org/#/c/6925/
>>>>
>>>> One of tservers died suddenly. Here is ERROR and FATAL log.
>>>>
>>>> E0605 15:04:33.376554 138642 tablet.cc:1219] T
>>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>>> Rowset selected for compaction but not available anymore: RowSet(150)
>>>>
>>>> E0605 15:04:33.376605 138642 tablet.cc:1219] T
>>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>>> Rowset selected for compaction but not available anymore: RowSet(59)
>>>>
>>>> E0605 15:04:33.376615 138642 tablet.cc:1219] T
>>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>>> Rowset selected for compaction but not available anymore: RowSet(60)
>>>>
>>>> F0605 15:04:33.377100 138642 tablet.cc:1222] T
>>>> 3cca831acf744e1daee72582b8e16dc4 P 125dbd2ffb8a401bb7e4fd982995ccf8:
>>>> Was unable to find all rowsets selected for compaction
>>>>
>>>>
>>>> <End of Log>
>>>>
>>>>
>>>> Could I know what's the problem? Feel free to ask any information to
>>>> resolve it.
>>>>
>>>>
>>>> Thank,
>>>>
>>>>
>>>> Jason
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message