hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13632) Daemonization does not check process liveness before renicing
Date Thu, 22 Sep 2016 19:53:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514317#comment-15514317
] 

Allen Wittenauer edited comment on HADOOP-13632 at 9/22/16 7:52 PM:
--------------------------------------------------------------------

We're basically racing against the process startup time and subsequent failure. We might pass
that ps but still fail the renice, disown, or subsequent ps check.  That said, it wouldn't
hurt to put another ps check after the timer and before the pid file write to catch hopefully
a good chunk of the early failures.

The outfile may or may not be the correct file to look at, BTW. e.g., fs.defaultFS pointing
to file://// will leave the out file empty.

Two Sidenotes: 

* I wonder why this code doesn't use hadoop_status_daemon.  I'm sure there is a good reason
including that it was probably written before that function existed.  It probably should use
it though so that we take advantage of whatever features someone makes if they replace it.
 On the flip side, this code is extremely time critical (racey!) so the faster we are at completing,
the better.

* This is some of my least favorite code that I've written.  Handling pid files outside of
a daemon is full of fragility even outside of the edge cases. :(


was (Author: aw):
We're basically racing against the process startup time and subsequent failure. We might pass
that ps but still fail the renice, disown, or subsequent ps check.  That said, it wouldn't
hurt to put another ps check after the timer and before the pid file write to catch hopefully
a good chunk of the early failures.

Two Sidenotes: 

* I wonder why this code doesn't use hadoop_status_daemon.  I'm sure there is a good reason
including that it was probably written before that function existed.  It probably should use
it though so that we take advantage of whatever features someone makes if they replace it.
 On the flip side, this code is extremely time critical (racey!) so the faster we are at completing,
the better.

* This is some of my least favorite code that I've written.  Handling pid files outside of
a daemon is full of fragility even outside of the edge cases. :(

> Daemonization does not check process liveness before renicing
> -------------------------------------------------------------
>
>                 Key: HADOOP-13632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13632
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>
> If you try to daemonize a process that is incorrectly configured, it will die quite quickly.
However, the daemonization function will still try to renice it even if it's down, leading
to something like this for my namenode:
> {noformat}
> -> % bin/hdfs --daemon start namenode
> ERROR: Cannot set priority of namenode process 12036
> {noformat}
> It'd be more user-friendly instead of this renice error, we said that the process couldn't
be started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message