hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13837) Process check bug in hadoop_stop_daemon of hadoop-functions.sh
Date Tue, 29 Nov 2016 03:43:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704069#comment-15704069
] 

Weiwei Yang commented on HADOOP-13837:
--------------------------------------

Hello [~aw]

bq. The proposed patch assumes that the process will actually end

The hadoop_status_daemon_wrapper was going to wait at maximum 5 secs, if process doesn't get
to the expected state (started or stopped), it will terminate and return an error code 1.
Won't be an infinite loop.

Just sleep has the problem that you don't know how long you want to sleep. Some cases, process
doesn't stop, then we should wait until times out, some cases, process was stopped in 1 or
2 secs, so we just wait for 1 or 2 secs.

> Process check bug in hadoop_stop_daemon of hadoop-functions.sh
> --------------------------------------------------------------
>
>                 Key: HADOOP-13837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13837
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>         Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch, check_proc.sh
>
>
> Always get {{ERROR: Unable to kill ...}} after {{Trying to kill with kill -9}}, see following
output of stop-yarn.sh
> {code}
> <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying
to kill with kill -9
> <NM_HOST>: ERROR: Unable to kill 18097
> {code}
> hadoop_stop_daemon doesn't check process liveness correctly, this bug can be reproduced
by the script easily. kill -9 would need some time to be done, directly check process existence
right after mostly will fail.
> {code}
> function hadoop_stop_daemon
> {
>     ...
>       kill -9 "${pid}" >/dev/null 2>&1
>     fi
>     if ps -p "${pid}" > /dev/null 2>&1; then
>       hadoop_error "ERROR: Unable to kill ${pid}"
>     else
>       ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message