hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers
Date Thu, 28 Feb 2013 07:33:25 GMT

    [ https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589296#comment-13589296
] 

Bikas Saha commented on YARN-417:
---------------------------------

I think if ContainerExitCodes needs to be added then it should be its own jira because its
an addition to the YARN API and should be kept distinct from this jira. This jira could be
marked dependent on that jira. Its also missing out of memory, preemption from what I see
in the patch.

ContainerRequest is something thats tightly coupled with the AMRMClient and hence I had put
it inside AMRMClient. Its expected to be used in other places and thats why its public.

The helper function would have helped because containers contain information set by 2 entities
- RM & NM. And its "status" is a combination of containerState and containerExitCode.
e.g. state could be running in which case exit codes dont matter. The state could be completed
in which case the exit code can tell us where it was killed or not. The exit code may not
be enough because the RM could preempt a container before its launched and hence may not have
a real exit code. Exit codes are not portable across platforms (eg. Windows and Linux). The
helper function lets the library hide all this and present a single status value for the user
to look at. Whether the container is allocated, running, completed_with_success, killed, preempted,
out of memory etc. At some point this could move into YARN but as it evolves, the library
might be a good place to house it. Does that help clarify its utility?

Why is client.start() being called in init? client.stop() is being called in stop().
{code}
+  @Override
+  public void init(Configuration conf) {
+    super.init(conf);
+    client.init(conf);
+    client.start();
+  }
{code}

Not waiting for the thread to join()? Why interrupt()? Thread needs to be stopped first so
that it stops calling into the client. or else it can call into a client that has already
stopped.
{code}
+  @Override
+  public void stop() {
+    client.stop();
+    keepRunning = false;
+    thread.interrupt();
+  }
{code}

I am wary of calling back on the heartbeat thread itself. If you notice the interface patch
I had uploaded, I had  left some comments on moving this to its own thread. This is important
because the callback code can be arbitrary and may not complete in time for our heartbeat,
specially with 1000's of containers. We cannot let our heartbeat rate be dependent on app
code performance.
                
> Add a poller that allows the AM to receive notifications when it is assigned containers
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-417
>                 URL: https://issues.apache.org/jira/browse/YARN-417
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api, applications
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, YARN-417-1.patch,
YARN-417.patch, YarnAppMaster.java, YarnAppMasterListener.java
>
>
> Writing AMs would be easier for some if they did not have to handle heartbeating to the
RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message