hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han-Cheol Cho <hancheol....@nhn-playart.com>
Subject A problem with Hadoop PID files
Date Wed, 22 Oct 2014 04:47:26 GMT
Hi all,
 
I am using Monit to monitor hadoop processes and automatically restart them when failed.
 
From time to time, however, a hadoop process (e.g., namenode) runs with the PID, saying 1111,
while its pid file (in /var/run/hadoop-hdfs/hadoop-hdfs-namenode.pid) has a different value,
saying 1222.
Monit assumes that the service is not running and tries to re-run it using the specified command
"/sbin/service hadoop-hdfs-namenode start".
The problem is that the Namenode is already running (with a different pid from the pid file).
Therefore, the service command fails, but it renews the pid file so that the number in this
file is just growing again and again...
 
Probably, Monit, after it found the Namenode is not running, relaunches the Namenode multiple
times shortly; as a result, the first one goes up but the second one overwrites the pid file.
And the launching script also does not seem to have any lock routine to protect the pid file.
 
Is there anyone who had experienced a similar problem?
Temporarily, I am using a workaround to stop the process (kill -15 pid) since "service ...
stop" also does not work. 
 
Best wishes,



 
 趙漢哲  (CHO, Han-Cheol. Ph.D)
データ研究室   / 社員 (Data Science Lab.   / Data scientist)
TEL: 03-5155-1160 (部署代表)   FAX: 03-5155-3307

  --> 〒150-8510 東京都渋谷区渋谷2-21-1 渋谷ヒカリエ 27階
Email  hancheol.cho@nhn-playart.com   Messenger   



 
Mime
View raw message