kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Berkeley <wdberke...@cloudera.com>
Subject Re: Segmentation Fault when running kudu ksck
Date Mon, 20 Aug 2018 18:35:39 GMT
Sorry, I meant taking a 1.6 Kudu tool and running it against your 1.5
cluster.

-Will

On Mon, Aug 20, 2018 at 11:19 AM William Berkeley <wdberkeley@gmail.com>
wrote:

> That looks like KUDU-2113, which was fixed in 1.6.0.
>
> It happens if the tablet servers report peers in their config that are not
> known to the master. Probably, you have removed servers from the cluster
> and some of the tablets are in a bad state as a result. These sorts of
> problems were unfortunately common on earlier Kudu releases. Every new
> version since 5.12 had made significant improvements to prevent these sorts
> of situations. I'd recommend upgrading to 1.5, or at least taking a 1.5
> kudu tool and running it against the 1.4 cluster to see what the issues are.
>
> -Will
>
> On Mon, Aug 20, 2018 at 10:57 AM, Vincent Kooijman <
> vincent.kooijman@onmarc.nl> wrote:
>
>> Hi all,
>>
>>
>>
>> We're running into a few Kudu issues with the first being the Kudu
>> cluster check utility (sudo -u kudu
>> /opt/cloudera/parcels/CDH/lib/kudu/bin-debug/kudu cluster ksck) showing:
>>
>>
>>
>> Connected to the Master
>>
>> Fetched info from all 10 Tablet Servers
>>
>>
>>
>> Tablet 41bf41e4127a46c69242f707298cf4ba of table 'xxx' is
>> under-replicated: 1 replica(s) not RUNNING
>>
>>   1b3d49dd6ce64acda32f97a89d7de193: TS unavailable
>>
>>   1a05af887edf4ba7b5c1731ce3508b19 (pdn05:7050): RUNNING [LEADER]
>>
>>   4028533287964369928034c3616a0a16 (pdn01:7050): RUNNING
>>
>>
>>
>> 2 replicas' active configs differ from the master's.
>>
>>   All the peers reported by the master and tablet servers are:
>>
>>   A = 1a05af887edf4ba7b5c1731ce3508b19
>>
>>   B = 1b3d49dd6ce64acda32f97a89d7de193
>>
>>   C = 4028533287964369928034c3616a0a16
>>
>>
>>
>> *The consensus matrix is:*
>>
>> *Segmentation fault*
>>
>>
>>
>> There is some mention of segmentation fault in combination with ksck in
>> the Kudu release notes for 1.4.0, but we are running 1.5.0 on a CDH cluster.
>>
>>
>>
>> Some notes:
>>
>>
>>
>>    - All masters (we have 3) are up with one leader being elected
>>    - All tablet servers (10) are live and visible in the master web UI
>>    - We've ran kudu fs check ... -repair on all servers (master & tablet)
>>    - Master logs are filled with errors like:
>>
>>    Previously reported cstate for tablet
>>    5977f01cea44448a908bb56f97b46d9e (table 'xxx'
>>    [id=bb359f4b89dd46e797e2e24f9efac971]) gave a different leader for term
>>    2007 than the current cstate. Previous cstate: current_term: 2007
>>    leader_uuid: ""
>>
>>    - And tablet server logs contain a lot of:
>>
>>    Couldn't send request to peer 228515616baf44a99561c2b72dfb3bab for
>>    tablet 138854a04f804f4ebf42df657c22b995. Error code: TABLET_NOT_RUNNING
>>    (12). Status: Illegal state: Tablet not RUNNING: INITIALIZED. Retrying in
>>    the next heartbeat period. Already tried 12813 times.
>>
>>
>>
>> We're a bit lost as to where to look next.
>>
>>
>>
>> If anyone can point us in the right direction, that would be great!
>>
>>
>> Thanks,
>>
>>
>>
>> Vincent
>>
>
>

Mime
View raw message