hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tianyin Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-2832) Wrong Logic of Checking Causes Latent Errors
Date Sat, 08 Nov 2014 03:34:34 GMT
Tianyin Xu created YARN-2832:
--------------------------------

             Summary: Wrong Logic of Checking Causes Latent Errors
                 Key: YARN-2832
                 URL: https://issues.apache.org/jira/browse/YARN-2832
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 2.5.1, 2.4.1
         Environment: Any environment
            Reporter: Tianyin Xu


NodeManager allows users to specify the health checker script that will be invoked by the
health-checker service via the configuration parameter, "_yarn.nodemanager.health-checker.script.path_"


During the _serviceInit()_ of the health-check service, NM checks whether the parameter is
set correctly using _shouldRun()_, as follows,

{code:title=/* NodeHealthCheckerService.java */|borderStyle=solid}
  protected void serviceInit(Configuration conf) throws Exception {
    if (NodeHealthScriptRunner.shouldRun(conf)) {
      nodeHealthScriptRunner = new NodeHealthScriptRunner();
      addService(nodeHealthScriptRunner);
    }
    addService(dirsHandler);
    super.serviceInit(conf);
  }
{code}

The problem is that if the parameter is misconfigured (e.g., permission problem, wrong path),
NM does not have any log message to inform users which could cause latent errors or mysterious
problems (e.g., "why my scripts does not work?")

I see the checking and printing logic is put in _serviceStart()_ function in _NodeHealthScriptRunner.java_.
However, the logic is very wrong. For an incorrect parameter that does not pass the "shouldRun"
check, _serviceStart()_ would never be called because the _NodeHealthScriptRunner_ instance
does not have the chance to be created (see the code snippets above).

{code:title=/* NodeHealthScriptRunner.java */|borderStyle=solid}
  protected void serviceStart() throws Exception {
    // if health script path is not configured don't start the thread.
    if (!shouldRun(conf)) {
      LOG.info("Not starting node health monitor");
      return;
    }
    ... 
  }  
{code}

Basically, I think the checking and printing logic should be put in the serviceInit() in NodeHealthCheckerService
instead of serviceStart() in NodeHealthScriptRunner.

See the attachment for the simple patch.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message