hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
Date Wed, 27 May 2015 11:07:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560774#comment-14560774
] 

Rohith commented on YARN-3535:
------------------------------

Thanks [~peng.zhang] for working on this issue..  
Some comments
# I think the method {{recoverResourceRequestForContainer}} should be synchronized, any thought?
# Why do we require {{RMContextImpl.java}} changes? I think this we can avoid, not necessarily
required.

Tests : 
# Any specific reason for chaning {{TestAMRestart.java}}?
# IIUC, this issue can occur in all the scheduler given AM-RM heart beat is lesser than NM-RM
heart beat interval. So can it include FT test case that applicable for both CS and FS. May
it you can add test in the extending class {{ParameterizedSchedulerTestBase}} i.e TestAbstractYarnScheduler.


>  ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-3535
>                 URL: https://issues.apache.org/jira/browse/YARN-3535
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>            Assignee: Peng Zhang
>              Labels: BB2015-05-TBR
>         Attachments: YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log
>
>
> During rolling update of NM, AM start of container on NM failed. 
> And then job hang there.
> Attach AM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message