hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
Date Thu, 28 Feb 2013 18:47:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589785#comment-13589785

Sandy Ryza commented on YARN-417:

bq. I think if ContainerExitCodes needs to be added then it should be its own jira 
Will move the container exit codes in a separate JIRA.

bq. The helper function would have helped because containers contain information set by 2
The issue is that there is not a ton of information for a helper function to interpret.  From
what I can tell, The framework only defines two special exit codes, and does not distinguish
between OOMs and other kinds of container failures, or between killing a container because
it was preempted or because the RM lost track of it.  These exit codes are platform independent,
and any other exit codes can be both application and platform dependent, so the AMRMClientAsync
wouldn't know how to interpret them.  As ContainerStatuses coming from the RM are only in
the context of container completions, ContainerState provides no extra information. Additional
information can sometimes be found in the diagnostics strings, but if the reasons that containers
die are to be codified, I don't think it should be done by interpreting strings at the API

bq. Why is client.start() being called in init? client.stop() is being called in stop().
registerApplicationMaster needs to be called after setting up the RM proxy, which occurs in
AMRMClient#start, but before starting the heartbeater, which occurs in AMRMClientAsync#start.
 Another way to accomplish this would be to move the code in AMRMClientImpl#start to AMRMClientImpl#init,
which also seems reasonable to me.  A third way would be to call registerApplicationMaster
from AMRMClientAsync#start.

bq. I am wary of calling back on the heartbeat thread itself.
Will add a handling thread.

bq. Not waiting for the thread to join()? Why interrupt()? Thread needs to be stopped first
so that it stops calling into the client. or else it can call into a client that has already
Good point. My reason was that I've seen this as convention other places in YARN (see NodeStatusUpdaterImpl,
for example), and that it would allow stop to be called from onContainerCompleted without
deadlock, but with the handling thread, the latter shouldn't be a problem, so I'll change

> Add a poller that allows the AM to receive notifications when it is assigned containers
> ---------------------------------------------------------------------------------------
>                 Key: YARN-417
>                 URL: https://issues.apache.org/jira/browse/YARN-417
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, YARN-417-1.patch,
YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java
> Writing AMs would be easier for some if they did not have to handle heartbeating to the
RM on their own.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message