hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantinos Karanasos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
Date Wed, 23 May 2018 18:28:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487803#comment-16487803
] 

Konstantinos Karanasos commented on YARN-8346:
----------------------------------------------

Thanks for the patch, [~jlowe].

Indeed you are right –  the problem is the lack of execution type. The queue size should
remain 0 given that opportunistic containers are disabled in this case.

+1 for the patch.

> Upgrading to 3.1 kills running containers with error "Opportunistic container queue is
full"
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-8346
>                 URL: https://issues.apache.org/jira/browse/YARN-8346
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 3.1.0, 3.0.2
>            Reporter: Rohith Sharma K S
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: YARN-8346.001.patch
>
>
> It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the running containers
are killed and second attempt is launched for that application. The diagnostics message is
"Opportunistic container queue is full" which is the reason for container killed. 
> In NM log, I see below logs for after container is recovered.
> {noformat}
> 2018-05-23 17:18:50,655 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
Opportunistic container [container_e06_1527075664705_0001_01_000001] will not be queued at
the NMsince max queue length [0] has been reached
> {noformat}
> Following steps are executed for rolling upgrade
> # Install 2.8.4 cluster and launch a MR job with distributed cache enabled.
> # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration.
> # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message