hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Antal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout
Date Mon, 15 Jul 2019 15:14:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885318#comment-16885318

Adam Antal commented on YARN-9667:

The error message reproduced intermittently in our setup downstream, but I will try to provide
some more information soon.

The suggestions you mentioned are seemingly good to me, [~pbacsko].

> Container-executor.c duplicates messages to stdout
> --------------------------------------------------
>                 Key: YARN-9667
>                 URL: https://issues.apache.org/jira/browse/YARN-9667
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, yarn
>    Affects Versions: 3.2.0
>            Reporter: Adam Antal
>            Priority: Major
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
Shell execution returned exit code: 143. Privileged Execution Operation Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_000019/container_e84_1561921629886_0001_01_000019.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script paths..."
part, though in the Stdout log we can clearly see it has been written there twice. After consulting
with [~pbacsko] it seems like there's a missing flush in container-executor.c before the fork
and that causes the duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit misleading
that the child process writes out "Getting exit code file" and "Creating script paths" even
though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the code and
change them to a single call, so that the fflush calls would not be forgotten accidentally.
(It can cause problems in every place where it's used).
> Note: this issue probably affects every occasion of fork(), not just the one from {{launch_container_as_user}}
in {{main.c}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message