kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Kooijman <vincent.kooij...@onmarc.nl>
Subject Segmentation Fault when running kudu ksck
Date Mon, 20 Aug 2018 17:57:25 GMT
Hi all,

We're running into a few Kudu issues with the first being the Kudu cluster check utility (sudo
-u kudu /opt/cloudera/parcels/CDH/lib/kudu/bin-debug/kudu cluster ksck) showing:

Connected to the Master
Fetched info from all 10 Tablet Servers

Tablet 41bf41e4127a46c69242f707298cf4ba of table 'xxx' is under-replicated: 1 replica(s) not
  1b3d49dd6ce64acda32f97a89d7de193: TS unavailable
  1a05af887edf4ba7b5c1731ce3508b19 (pdn05:7050): RUNNING [LEADER]
  4028533287964369928034c3616a0a16 (pdn01:7050): RUNNING

2 replicas' active configs differ from the master's.
  All the peers reported by the master and tablet servers are:
  A = 1a05af887edf4ba7b5c1731ce3508b19
  B = 1b3d49dd6ce64acda32f97a89d7de193
  C = 4028533287964369928034c3616a0a16

The consensus matrix is:
Segmentation fault

There is some mention of segmentation fault in combination with ksck in the Kudu release notes
for 1.4.0, but we are running 1.5.0 on a CDH cluster.

Some notes:

  *   All masters (we have 3) are up with one leader being elected
  *   All tablet servers (10) are live and visible in the master web UI
  *   We've ran kudu fs check ... -repair on all servers (master & tablet)
  *   Master logs are filled with errors like:

Previously reported cstate for tablet 5977f01cea44448a908bb56f97b46d9e (table 'xxx' [id=bb359f4b89dd46e797e2e24f9efac971])
gave a different leader for term 2007 than the current cstate. Previous cstate: current_term:
2007 leader_uuid: ""

  *   And tablet server logs contain a lot of:

Couldn't send request to peer 228515616baf44a99561c2b72dfb3bab for tablet 138854a04f804f4ebf42df657c22b995.
Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: INITIALIZED.
Retrying in the next heartbeat period. Already tried 12813 times.

We're a bit lost as to where to look next.

If anyone can point us in the right direction, that would be great!


View raw message