hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (YARN-3760) Log aggregation failures
Date Wed, 29 Mar 2017 16:03:41 GMT

     [ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp reopened YARN-3760:
-------------------------------

Line numbers are from an old release but the error is evident.
{code}
java.lang.IllegalStateException: Cannot close TFile in the middle of key-value insertion.
        at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310)
        at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
{code}

_AggregatedLogFormat.LogWriter_
{code}
    public void close() {
      try {
        this.writer.close();
      } catch (IOException e) {
        LOG.warn("Exception closing writer", e);
      }
      IOUtils.closeStream(fsDataOStream);
    }
{code}
TFile writer's close which may throw {{IllegalStateException}} if the underlying fs data stream
failed.  Unfortunately it only catches IOE, so the ISE rips out w/o closing the fsdata stream.

Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a try/catch.  If
the TFile.Writer ctor throws an exception, it's impossible to close the stream.

I haven't checked if there are futher issues with closing the writer high in the stack.

> Log aggregation failures 
> -------------------------
>
>                 Key: YARN-3760
>                 URL: https://issues.apache.org/jira/browse/YARN-3760
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes fail.  This
leaves a lease renewer active in the NM that spams the NN with lease renewals.  If the token
is marked not to be cancelled, the renewals appear to continue until the token expires.  If
the token is cancelled, the periodic renew spam turns into a flood of failed connections until
the lease renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message