hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Or Sher <or.sh...@gmail.com>
Subject Re: Re: Stopping ntpd signals SIGTERM, then causes namenode exit
Date Wed, 11 Feb 2015 07:36:45 GMT
I'm not sure it's related but I encountered a similar issue a few months
ago.
In my case, it was an "at" command sending a kill signal to the at daemon
with it's correct pid.
Somehow, once in a while this signal got to Cassandra process (Java as
well) and killed it.
After some time of investigation I assumed this have to be a kernel bug or
something and I've opened a ticket for CentOS -
http://bugs.centos.org/view.php?id=7539 which no body is really looking at
:)
You can read there how I tried to tackle it.
Bottom line, we've changed the at scheduler to a different implementation
and we don't get this issue any more.

HTH,
Or.


On Wed, Feb 11, 2015 at 3:39 AM, David chen <c77_cn@163.com> wrote:

> The command 'service ntpd stop' could be triggered around 14:00.
> Because the crontab was set as follows:
> 0 * * * * sh sync.sh
> The script contains the following command:
> #!/bin/bash
> service ntpd stop
> ntpdate 192.168.0.1 #it's a valid ntpd server in LAN
> service ntpd start
> chkconfig ntpd on
>
> Found the following fragment in /var/log/message:
> Jan  7 14:00:01 host1 ntpd[32101]: ntpd exiting on signal 15
> Jan  7 13:59:59 host1 ntpd[44764]: ntpd 4.2.4p8@1.1612-o Fri Feb 22
> 11:23:27 UTC 2013 (1)
> Jan  7 13:59:59 host1 ntpd[44765]: precision = 0.143 usec
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #0 wildcard,
> 0.0.0.0#123 Disabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #1 wildcard,
> ::#123 Disabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #2 lo, ::1#123
> Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #3 em2,
> fe80::ca1f:66ff:fee1:eed#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #4 lo,
> 127.0.0.1#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on interface #5 em2,
> 192.168.1.151#123 Enabled
> Jan  7 13:59:59 host1 ntpd[44765]: Listening on routing socket on fd #22
> for interface updates
> Jan  7 13:59:59 host1 ntpd[44765]: kernel time sync status 2040
> Jan  7 13:59:59 host1 ntpd[44765]: frequency initialized 499.399 PPM from
> /var/lib/ntp/drift
> Jan  7 14:00:01 host1 ntpd_initres[32103]: parent died before we finished,
> exiting
> Jan  7 14:04:17 host1 ntpd[44765]: synchronized to 192.168.0.191, stratum 2
> Jan  7 14:04:17 host1 ntpd[44765]: kernel time sync status change 2001
> Jan  7 14:26:02 host1 snmpd[4842]: Received TERM or STOP signal...
>  shutting down...
> Jan  7 14:26:02 host1 kernel: netlink: 12 bytes leftover after parsing
> attributes.
> Jan  7 14:26:02 host1 snmpd[45667]: NET-SNMP version 5.5
> Jan  7 14:52:48 host1 ntpd[44765]: no servers reachable
>
> It looks likely the command 'service ntpd stop' send the SIGTERM signal.
> The above clue 'ntpd[32101]' indicates that the ntpd process PID is 32101,
> inspect NameNode log, i found that the NameNode process PID was not
> identical with ntpd.
> So i wonder why Namenode process can received the signal?
>



-- 
Or Sher

Mime
View raw message