tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Turner Eagles (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher
Date Tue, 19 Nov 2019 21:58:00 GMT

    [ https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977859#comment-16977859
] 

Jonathan Turner Eagles commented on TEZ-4067:
---------------------------------------------

Closer, as the DAGAppMaster no longer has knowledge about the LegacySpeculator. There are
still a few things to fix to get full encapsulation.
* All references to speculators need to be abstracted away.

{code}
// Stop speculators if any
stopSpeculators(currentDAG);
{code}

Should be something like this
{code}
// Stop dependent services
stopDependentServices(currentDAG);
{code}

Similar for the following code should change references to speculators to dependent services
{code}
+        // If we reach here, then we have recoverable DAG and we need to reinitialize the
speculators.
+        // start speculators of the recovered DAG
+        startSpeculators(currentDAG);
{code}

We need to avoid calling isSpeculationEnabled() and getSpeculator() and startSpeculator().
Instead List<AbstractService> getDependentServices. The vertex can return include the
speculator in the dependent services is speculation is enabled. 
Do we need to call startSpeculator at all? As a dependent service, startService will be called
automatically. Similarly do we need a launch function at all? I'm a little worried that launch
will start a thread and the startService will be called and launch another thread. Perhaps
the state of the service will prevent this. Could you explain the reasoning for calling launch
manually instead of relying on startServices to be called automatically?
{code}
+  private void startSpeculators(DAG dag) {
+    for (Vertex v : dag.getVertices().values()) {
+      if (!v.isSpeculationEnabled()) {
+        continue;
+      }
+      if (v.startSpeculator()) {
+        addIfService(v.getSpeculator(), false);
+      }
+    }
+  }
+
+  private Exception stopSpeculators(DAG dag) {
+    Exception firstException = null;
+    for (Vertex v : dag.getVertices().values()) {
+      if (!v.isSpeculationEnabled()) {
+        continue;
+      }
+
+      Exception ex = v.stopSpeculator();
+      if (ex != null && firstException == null) {
+        firstException = ex;
+        continue;
+      }
+      // remove the speculator service from the list of services
+      services.remove(v.getSpeculator());
+    }
+    return firstException;
+  }
{code}

> Tez Speculation decision is calculated on each update by the dispatcher
> -----------------------------------------------------------------------
>
>                 Key: TEZ-4067
>                 URL: https://issues.apache.org/jira/browse/TEZ-4067
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>         Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch, TEZ-4067.003.patch, TEZ-4067.004.patch,
TEZ-4067.005.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are handled
synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to check the runtime
estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum decision.
Ideally, based on resources, speculated tasks should be the ones with slowest progress.
>  # the time between speculation is skewed because there is a big delay for the dispatcher
to complete a full cycle. Also, speculation will be more aggressive compared to MR because
MR waits for "soonest.retry.after.speculate" whenever a task is speculated. On the other hand,
Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message