From issues-return-40986-archive-asf-public=cust-asf.ponee.io@tez.apache.org Tue Nov 19 21:58:03 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 917BF180638 for ; Tue, 19 Nov 2019 22:58:03 +0100 (CET) Received: (qmail 46423 invoked by uid 500); 19 Nov 2019 21:58:02 -0000 Mailing-List: contact issues-help@tez.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tez.apache.org Delivered-To: mailing list issues@tez.apache.org Received: (qmail 46410 invoked by uid 99); 19 Nov 2019 21:58:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Nov 2019 21:58:02 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A8503E2F4E for ; Tue, 19 Nov 2019 21:58:01 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id D60A47803FE for ; Tue, 19 Nov 2019 21:58:00 +0000 (UTC) Date: Tue, 19 Nov 2019 21:58:00 +0000 (UTC) From: "Jonathan Turner Eagles (Jira)" To: issues@tez.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (TEZ-4067) Tez Speculation decision is calculated on each update by the dispatcher MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/TEZ-4067?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D169778= 59#comment-16977859 ]=20 Jonathan Turner Eagles commented on TEZ-4067: --------------------------------------------- Closer, as the DAGAppMaster no longer has knowledge about the LegacySpecula= tor. There are still a few things to fix to get full encapsulation. * All references to speculators need to be abstracted away. {code} // Stop speculators if any stopSpeculators(currentDAG); {code} Should be something like this {code} // Stop dependent services stopDependentServices(currentDAG); {code} Similar for the following code should change references to speculators to d= ependent services {code} + // If we reach here, then we have recoverable DAG and we need to r= einitialize the speculators. + // start speculators of the recovered DAG + startSpeculators(currentDAG); {code} We need to avoid calling isSpeculationEnabled() and getSpeculator() and sta= rtSpeculator(). Instead List getDependentServices. The ver= tex can return include the speculator in the dependent services is speculat= ion is enabled.=20 Do we need to call startSpeculator at all? As a dependent service, startSer= vice will be called automatically. Similarly do we need a launch function a= t all? I'm a little worried that launch will start a thread and the startSe= rvice will be called and launch another thread. Perhaps the state of the se= rvice will prevent this. Could you explain the reasoning for calling launch= manually instead of relying on startServices to be called automatically? {code} + private void startSpeculators(DAG dag) { + for (Vertex v : dag.getVertices().values()) { + if (!v.isSpeculationEnabled()) { + continue; + } + if (v.startSpeculator()) { + addIfService(v.getSpeculator(), false); + } + } + } + + private Exception stopSpeculators(DAG dag) { + Exception firstException =3D null; + for (Vertex v : dag.getVertices().values()) { + if (!v.isSpeculationEnabled()) { + continue; + } + + Exception ex =3D v.stopSpeculator(); + if (ex !=3D null && firstException =3D=3D null) { + firstException =3D ex; + continue; + } + // remove the speculator service from the list of services + services.remove(v.getSpeculator()); + } + return firstException; + } {code} > Tez Speculation decision is calculated on each update by the dispatcher > ----------------------------------------------------------------------- > > Key: TEZ-4067 > URL: https://issues.apache.org/jira/browse/TEZ-4067 > Project: Apache Tez > Issue Type: Improvement > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Minor > Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch, TEZ-4067.003= .patch, TEZ-4067.004.patch, TEZ-4067.005.patch > > > LegacySpeculator is an object field in=C2=A0VertexImpl. Therefore, all ev= ents are handled synchronously by the caller (dispatcher). This implies the= following: > # the dispatcher spends long time executing updateStatus as it needs to = check the runtime estimation of the tezAttempts within the vertex. > # the speculator is per stage: lunching a speculation may not the optimu= m decision. Ideally, based on resources, speculated tasks should be the one= s with slowest progress. > # the time between speculation is skewed because there is a big delay fo= r the dispatcher to complete a full cycle. Also, speculation will be more a= ggressive compared to MR because MR waits for "soonest.retry.after.speculat= e" whenever a task is speculated. On the other hand, Tez speculates more ta= sks as it processes stages in parallel. > =C2=A0 -- This message was sent by Atlassian Jira (v8.3.4#803005)