tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-341) Allow DAG to be submitted to an AM after it has started
Date Tue, 20 Aug 2013 01:57:51 GMT

    [ https://issues.apache.org/jira/browse/TEZ-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744615#comment-13744615
] 

Bikas Saha commented on TEZ-341:
--------------------------------

Looks good overall. I suspect this code path will be under active development and so we can
fix bugs/improve-things as we stabilize.

Now that this includes the session concept, referring to session might be better in this name
{code}
public static final String TEZ_AM_DAG_OVER_RPC_ENV = "AM_DAG_OVER_RPC";
{code}

Once we run more than 1 DAG in the AM we should de-couple AM state from DAG state.
{code}
-      switch(dag.getState()) {
+      switch(currentDAG.getState()) {
       case SUCCEEDED:
{code}

Should we check the current dag state. no point killing if its already completed. this is
just looking ahead to multiple dags case
{code}
+    public synchronized void shutdownAM() {
+      LOG.info("Received message to shutdown AM");
+      if (currentDAG != null) {
+        //send a DAG_KILL message
+        LOG.info("Sending a kill event to the current DAG"
+            + ", dagId=" + currentDAG.getID());
{code}

TEZ-71 is supposed to remove references to TEZ_STAGING_DIR (in config or literal). Lets avoid
using it any further. Should be easy to create a unique staging directory.
{code}
+    Path stagingDir =
+        new Path(tezConf.get(TezConfiguration.TEZ_AM_STAGING_DIR,
+            TezConfiguration.TEZ_AM_STAGING_DIR_DEFAULT)
{code}

The AM may not be running at this time. YARN may have just allocated/launched the AM container.
Its probably ok to skip this test for now and add it when we have "real" sessions.
{code}
+        YarnApplicationState appState = appReport.getYarnApplicationState();
+        if (!sentKillSession) {
+          if (appState == YarnApplicationState.RUNNING) {
+            tezClient.closeSession(tezSession);
+            sentKillSession = true;
{code}

We may choose to remove getApplicationId() from TezSession and DAGClient for now and add it
later if needed. Currently there are 0 users and we may be better of hiding that link. At
least mark them @Private 

Cannot assume valid host port exists until the AM actually starts running (even though YARN
marks state as RUNNING). DAGClient code current checks for this case. There is a YARN jira
open to provide this info clearly.
{code}
+  static DAGClientAMProtocolBlockingPB getAMProxy(Configuration conf,
+      String amHost, int amRpcPort) throws IOException {
+    InetSocketAddress addr = new InetSocketAddress(amHost,
+        amRpcPort);
{code}

In general, didnt quite follow how getAMProxies() is working. It may throw an exception if
the AM is not running but I did not see that exception being handled and this exception may
always occur due to race conditions. What happens to the stored proxy when the app is retried
by YARN? For now, it may be simpler to not store the proxy objects and create them as needed.
We can revisit if that turns out to be an issue. closeSession() does not seem like a perf-sensitive
operation as of now.

                
> Allow DAG to be submitted to an AM after it has started
> -------------------------------------------------------
>
>                 Key: TEZ-341
>                 URL: https://issues.apache.org/jira/browse/TEZ-341
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Hitesh Shah
>              Labels: TEZ-0.2.0
>             Fix For: 0.2.0
>
>         Attachments: TEZ-341.1.patch, TEZ-341.3.patch, TEZ-341.wip.2.patch
>
>
> Allow the DAG to be submitted over RPC after the AM has started. AM runs a single DAG
and exits. The DAG is run in the context of the user who submitted the AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message