kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Hoo <darren....@gmail.com>
Subject Re: kudu master crashes
Date Tue, 11 Oct 2016 18:24:55 GMT
Hi Todd,

Thanks for the info.

kudu master will refuse to start if clock is out of sync, but will kudu
master exit abruptly if the clock drifts  when kudu master is running?

We have only one NTP server running and all other nodes in the cluster
synchronized to this server, I shall check ntp manuals
and setup multiple NTP servers.

On Wed, Oct 12, 2016 at 12:12 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Darren,
>
> It sounds like the server must have briefly lost NTP synchronization. As
> far as I know, Cloudera Manager's alert doesn't check for the status as
> reported by ntptime, but rather checks that the agent and the CM master
> have relatively close clocks. Even if the kudu server lost sync for a
> couple minutes, the clock probably didn't drift enough to trigger CM's
> warning.
>
> Do you already have multiple NTP servers configured in your ntp
> configuration? That's usually helpful for better redundancy.
>
> -Todd
>
>
>
> On Tue, Oct 11, 2016 at 1:01 AM, Darren Hoo <darren.hoo@gmail.com> wrote:
>
>> It seems that it's caused by this:
>>
>> + exec /opt/cloudera/parcels/KUDU-1.0.0-1.kudu1.0.0.p0.6/lib/kudu/sbin/kudu-master
>> --master_addresses=nm-new,snm-new --flagfile=/var/run/cloudera-s
>> cm-agent/process/11172-kudu-KUDU_MASTER/gflagfile
>>
>> F1011 14:35:27.318984 32076 hybrid_clock.cc:227] Couldn't get the current
>> time: Clock unsynchronized. Status: Service unavailable: Error reading
>> clock. Clock considered unsynchronized
>>
>> *** Check failure stack trace: ***
>>
>>     @           0x7e6a2d  google::LogMessage::Fail()
>>
>>     @           0x7e892d  google::LogMessage::SendToLog()
>>
>>     @           0x7e6569  google::LogMessage::Flush()
>>
>>     @           0x7e93cf  google::LogMessageFatal::~LogMessageFatal()
>>
>>     @           0xa1a56e  kudu::server::HybridClock::NowWithError()
>>
>>     @           0xa1b973  kudu::server::HybridClock::NowForMetrics()
>>
>>     @           0x85f8f4  kudu::FunctionGauge<>::WriteValue()
>>
>>     @          0x1916700  kudu::Gauge::WriteAsJson()
>>
>>     @          0x1917d65  kudu::MetricEntity::WriteAsJson()
>>
>>     @          0x1919271  kudu::MetricRegistry::WriteAsJson()
>>
>>     @           0x995721  (unknown)
>>
>>     @           0x98e5f6  kudu::Webserver::RunPathHandler()
>>
>>     @           0x98f171  kudu::Webserver::BeginRequestCallbackStatic()
>>
>>     @           0x9b2f6e  (unknown)
>>
>>     @           0x9b586e  (unknown)
>>
>>     @           0x9b5f0c  (unknown)
>>
>>     @     0x7f4813ea1aa1  start_thread
>>
>>     @     0x7f4812c12aad  clone
>>
>>     @              (nil)  (unknown)
>>
>>
>>
>> *but ntptime shows OK:*
>>
>>
>> ntp_gettime() returns code 0 (OK)
>>
>>   time dba7194c.f8dbce6c  Tue, Oct 11 2016 15:54:52.972, (.972104188),
>>
>>   maximum error 471276 us, estimated error 11 us, TAI offset 0
>>
>> ntp_adjtime() returns code 0 (OK)
>>
>>   modes 0x0 (),
>>
>>   offset -10.130 us, frequency 44.000 ppm, interval 1 s,
>>
>>   maximum error 471276 us, estimated error 11 us,
>>
>>   status 0x2001 (PLL,NANO),
>>
>>   time constant 7, precision 0.001 us, tolerance 500 ppm
>>
>>
>>
>> *And there're no ntp unsynchronized warnings in cloudera manager.*
>>
>>
>>
>> On Tue, Oct 11, 2016 at 3:29 PM, Darren Hoo <darren.hoo@gmail.com> wrote:
>>
>>> kudu master seldom crashes, but starting  with yesterday,  one of  our
>>> two kud masters crashes very often
>>>
>>> Can anyone help to see what's going on?
>>>
>>> you can obtain get core file here : http://167.88.124.211:8000/c
>>> ore.22459.xz
>>>
>>>
>>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message