hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14855) Hadoop scripts may errantly believe a daemon is still running, preventing it from starting
Date Sat, 09 Sep 2017 17:40:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160022#comment-16160022
] 

Steve Loughran commented on HADOOP-14855:
-----------------------------------------

You could always check to see if its a java process, which is resilient to any issues about
process name. How do you check that? jstack will do it, though its exit code 1 means both
"no process" and "process not listening"
{code}
bash-3.2$ time jstack 470
470: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding

real	0m5.439s
user	0m0.127s
sys	0m0.038s
bash-3.2$ echo $?
1
{code}

if the process is a java one, you get the stack trace and the exit code == 0

I could imagine a sequence of file -> pid -> kill -0 -> jstack, so the jstack check
is only done if the process is known to be running. 

> Hadoop scripts may errantly believe a daemon is still running, preventing it from starting
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14855
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Aaron T. Myers
>
> I encountered a case recently where the NN wouldn't start, with the error message "namenode
is running as process 16769.  Stop it first." In fact the NN was not running at all, but rather
another long-running process was running with this pid.
> It looks to me like our scripts just check to see if _any_ process is running with the
pid that the NN (or any Hadoop daemon) most recently ran with. This is clearly not a fool-proof
way of checking to see if a particular type of daemon is now running, as some other process
could start running with the same pid since the daemon in question was previously shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message