hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Antal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout
Date Wed, 31 Jul 2019 13:16:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897167#comment-16897167

Adam Antal commented on YARN-9667:

During some downstream testing we bumped into some problems with the container executor where
an extra logging would be quite helpful when local files and directories could not be created

If we plan to modify the fprintf/fflush pairs as [~pbacsko] suggested, could we add this extra
logging to the container-exeucutor? 
[~snemeth] what exact error message you suggest?

> Container-executor.c duplicates messages to stdout
> --------------------------------------------------
>                 Key: YARN-9667
>                 URL: https://issues.apache.org/jira/browse/YARN-9667
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, yarn
>    Affects Versions: 3.2.0
>            Reporter: Adam Antal
>            Assignee: Peter Bacsko
>            Priority: Major
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
Shell execution returned exit code: 143. Privileged Execution Operation Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_000019/container_e84_1561921629886_0001_01_000019.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script paths..."
part, though in the Stdout log we can clearly see it has been written there twice. After consulting
with [~pbacsko] it seems like there's a missing flush in container-executor.c before the fork
and that causes the duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit misleading
that the child process writes out "Getting exit code file" and "Creating script paths" even
though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the code and
change them to a single call, so that the fflush calls would not be forgotten accidentally.
(It can cause problems in every place where it's used).
> Note: this issue probably affects every occasion of fork(), not just the one from {{launch_container_as_user}}
in {{main.c}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message