hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Badger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8274) Docker command error during container relaunch
Date Fri, 11 May 2018 21:17:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472679#comment-16472679

Eric Badger commented on YARN-8274:

bq. With 3.1.1 code freeze on Saturday, it is easy to make mistakes, and I like to get YARN-7654
committed before end of today. YARN-7654 and YARN-8207 are probably left uncommitted for too
I understand that you want to get these patches into 3.1.1, but I don't believe we should
rush to get features into releases and in the process compromise on quality. Rushed patches/reviews
lead to bugs like this happening at an elevated rate. I'm also not particularly compelled
by the argument that YARN-7654 and YARN-8207 have been uncommitted for too long. YARN-8207
ended up being a 127 kB patch of entirely C code, which is incredibly time-consuming to review,
while YARN-7654 is now on patch number 23. It's not like these aren't getting reviewed, they
are just going through a normal process of comprehensive review. I think that YARN-8027 getting
committed in 2 weeks is a semi-miracle given the size, complexity, and possible ramifications
of the changes. Reviewing that much C code (especially in a setuid binary) throughout 10 different
patches is basically a full-time job. [~jlowe] has spent countless more hours/days than I
think should be reasonably expected and is still working in an attempt to get these patches
into 3.1.1. If anything, he should be commended and thanked for his yeoman’s effort here
regardless of whether YARN-7654 makes it into 3.1.1.

So, while I understand that deadlines exist and that we should strive to meet them, I don't
believe that we should rush patches in solely because of a deadline. That destabilizes the
project and causes more work for everyone. If a patch/feature isn't fully ready, we should
step back and get it into the next release rather than cut time on reviews and possibly miss
something. At the end of the day, if we are introducing bugs like this consistently, which
recently we have been, then we are clearly iterating too quickly and need to spend more time
on reviewing each patch instead of rushing them to being committed. 

> Docker command error during container relaunch
> ----------------------------------------------
>                 Key: YARN-8274
>                 URL: https://issues.apache.org/jira/browse/YARN-8274
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Billie Rinaldi
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 3.2.0, 3.1.1
>         Attachments: YARN-8274.001.patch, YARN-8274.002.patch
> I initiated container relaunch with a "sleep 60; exit 1" launch command and saw a "not
a docker command" error on relaunch. Haven't figured out why this is happening, but it seems
like it has been introduced recently to trunk/branch-3.1. cc [~shanekumpf@gmail.com] [~ebadger]
> {noformat}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
Relaunch container failed
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.relaunchContainer(DockerLinuxContainerRuntime.java:954)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.relaunchContainer(DelegatingLinuxContainerRuntime.java:150)
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.handleLaunchForLaunchType(LinuxContainerExecutor.java:562)
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.relaunchContainer(LinuxContainerExecutor.java:486)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.relaunchContainer(ContainerLaunch.java:504)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:111)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerRelaunch.call(ContainerRelaunch.java:47)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2018-05-09 21:41:46,631 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exception from container-launch.
> 2018-05-09 21:41:46,631 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Container id: container_1525897486447_0003_01_000002
> 2018-05-09 21:41:46,631 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exit code: 7
> 2018-05-09 21:41:46,631 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Exception message: Relaunch container failed
> 2018-05-09 21:41:46,631 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
Shell error output: docker: 'container_1525897486447_0003_01_000002' is not a docker command.
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message