kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cahill, Trey" <trey.cah...@siemens.com>
Subject Kudu 1.0.0 Tablet Server not Starting After Replacing Failed Drive
Date Wed, 02 Nov 2016 20:47:08 GMT
Hi All,

While running Kudu 1.0.0 with 9 tablet servers and a single master in a CDH 5.4.10 cluster,
a drive failed for one of the tablet servers.  The drive has since been replaced, but the
tablet server will not restart.
Below is the error from kudu-tserver.FATAL:
"Log file created at: 2016/11/02 19:27:17
Running on machine: i-d6d75566.intra.omneo.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F1102 19:27:17.451611 21593 tablet_server_main.cc:55] Check failed: _s.ok() Bad status: Already
present: Could not create new FS layout: FSManager root is not empty: /data/0/kudu/tserver"

The WARN and ERROR logs contain the same message.

The INFO log has the following output:
"Log file created at: 2016/11/02 19:27:17
Running on machine: i-d6d75566.intra.omneo.com
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1102 19:27:17.448385 21593 mem_tracker.cc:140] MemTracker: hard memory limit is 4.000000
GB
I1102 19:27:17.448578 21593 mem_tracker.cc:142] MemTracker: soft memory limit is 2.400000
GB
I1102 19:27:17.449854 21593 tablet_server_main.cc:54] Initializing tablet server...
I1102 19:27:17.450325 21593 hybrid_clock.cc:177] HybridClock initialized. Resolution in nanos?:
1 Wait times tolerance adjustment: 1.0005 Current error: 143827
I1102 19:27:17.451561 21593 server_base.cc:168] Could not load existing FS layout: Not found:
/data/0/kudu/tserver-wal/instance: No such file or directory (error 2)
I1102 19:27:17.451573 21593 server_base.cc:169] Creating new FS layout
F1102 19:27:17.451611 21593 tablet_server_main.cc:55] Check failed: _s.ok() Bad status: Already
present: Could not create new FS layout: FSManager root is not empty: /data/0/kudu/tserver"


Fs_wal_dir is set to "/data/0/kudu/tserver" and fs_data_dirs is set to ""/data/0/kudu/tserver,
/data/1/kudu/tserver, 2/data/2/kudu/tserver, /data/3/kudu/tserver" for every tablet server.

I searched, but could not seem to find a way to recover/start the tablet server.

Any thoughts?

Let me know if  you need more information or such.

Thanks,

Trey

Mime
View raw message