kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Kudu 1.0.0 Tablet Server not Starting After Replacing Failed Drive
Date Wed, 02 Nov 2016 20:53:08 GMT
Hi Trey,

Kudu currently requires removing all the Kudu data folders on a machine
when one disk fails. This is because Kudu effectively does striping over
all the data disks. Assuming you're not running with replication=1, your
data should already be re-replicated on your other nodes.

Hope this helps,

J-D

On Wed, Nov 2, 2016 at 1:47 PM, Cahill, Trey <trey.cahill@siemens.com>
wrote:

> Hi All,
>
>
>
> While running Kudu 1.0.0 with 9 tablet servers and a single master in a
> CDH 5.4.10 cluster, a drive failed for one of the tablet servers.  The
> drive has since been replaced, but the tablet server will not restart.
>
> Below is the error from kudu-tserver.FATAL:
>
> “Log file created at: 2016/11/02 19:27:17
>
> Running on machine: i-d6d75566.intra.omneo.com
>
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>
> F1102 19:27:17.451611 21593 tablet_server_main.cc:55] Check failed:
> _s.ok() Bad status: Already present: Could not create new FS layout:
> FSManager root is not empty: /data/0/kudu/tserver”
>
>
>
> The WARN and ERROR logs contain the same message.
>
>
>
> The INFO log has the following output:
>
> “Log file created at: 2016/11/02 19:27:17
>
> Running on machine: i-d6d75566.intra.omneo.com
>
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>
> I1102 19:27:17.448385 21593 mem_tracker.cc:140] MemTracker: hard memory
> limit is 4.000000 GB
>
> I1102 19:27:17.448578 21593 mem_tracker.cc:142] MemTracker: soft memory
> limit is 2.400000 GB
>
> I1102 19:27:17.449854 21593 tablet_server_main.cc:54] Initializing tablet
> server...
>
> I1102 19:27:17.450325 21593 hybrid_clock.cc:177] HybridClock initialized.
> Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current
> error: 143827
>
> I1102 19:27:17.451561 21593 server_base.cc:168] Could not load existing FS
> layout: Not found: /data/0/kudu/tserver-wal/instance: No such file or
> directory (error 2)
>
> I1102 19:27:17.451573 21593 server_base.cc:169] Creating new FS layout
>
> F1102 19:27:17.451611 21593 tablet_server_main.cc:55] Check failed:
> _s.ok() Bad status: Already present: Could not create new FS layout:
> FSManager root is not empty: /data/0/kudu/tserver”
>
>
>
>
>
> Fs_wal_dir is set to “/data/0/kudu/tserver” and fs_data_dirs is set to
> ““/data/0/kudu/tserver, /data/1/kudu/tserver, 2/data/2/kudu/tserver,
> /data/3/kudu/tserver” for every tablet server.
>
>
>
> I searched, but could not seem to find a way to recover/start the tablet
> server.
>
>
>
> Any thoughts?
>
>
>
> Let me know if  you need more information or such.
>
>
> Thanks,
>
>
> Trey
>

Mime
View raw message