From user-return-19193-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Apr 5 10:30:00 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 52B6718063B for ; Thu, 5 Apr 2018 10:30:00 +0200 (CEST) Received: (qmail 52000 invoked by uid 500); 5 Apr 2018 08:29:58 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 51986 invoked by uid 99); 5 Apr 2018 08:29:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Apr 2018 08:29:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8E1EDC2A18 for ; Thu, 5 Apr 2018 08:29:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.211 X-Spam-Level: *** X-Spam-Status: No, score=3.211 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id vWtfxdCoDIf9 for ; Thu, 5 Apr 2018 08:29:55 +0000 (UTC) Received: from mail-wr0-f181.google.com (mail-wr0-f181.google.com [209.85.128.181]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 9582F5F189 for ; Thu, 5 Apr 2018 08:29:54 +0000 (UTC) Received: by mail-wr0-f181.google.com with SMTP id l49so26957665wrl.4 for ; Thu, 05 Apr 2018 01:29:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=URi34moA4CgjJY6hifs6EGckMasRrndl3WJ1u+q1hd4=; b=kFfJwN/ulQn0LKB2eBvxMKLWiiyofgC7qcjJSZ8tHVxY1tDqSuOEXikbLO1ThHatZH IC2URp7cAHKcvRyuaZ/gNSoJxZjOT623dxbyMW+iEynLjK37l+XHGuAIbldXYCamDLjo Vj4400gljidQJnveb7ybk9kH0TB/nksbcwoL354XiZtQjJsXENadUd6cXCQbQdtYHo1n OC304+bQ7qb+xkRuqYyNqeth4nJ6QAQz0YrC4ksNHPIMVHVFfuIzuzeIiGJaxJZz/A7i ++O3RWv6wDZQeTDyuIiT7jLU5KvVKYrcsBX5hWM63dn2xq1K3R68TWB8WwNvw66UMcJ6 xvYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=URi34moA4CgjJY6hifs6EGckMasRrndl3WJ1u+q1hd4=; b=gaD9LcnGXKoic3k7HVqZGM5JsZdI6biYfajBSBbBwJTFwR3QrMf4TN/GstxUZcp5yZ cdiYGM5cTfOK537Iw2Vd9uKt9zsMwGOlFZJbDnYp42mRSQTJ2/nBTBy1A7rA1YUSi+fx ukYbKXsbEbyBTsPL0HRsBknWq9im6bzJl363bklzUaeqMLSiKlKEnmwyzKR1zyz2B7Bo VRRUXLR780TGrFo4cMhayNJ0YRHKmsw/rLD/bKIpiGOHc+U/XLs/tTuRn55wYsDYnAwW zRmpB8YLq4ePHT44SjYzuHWmnqgHzKYY0AkVhdTQjebWuoA1HEmdSTKO+aQ0pyLZEjbw +xTQ== X-Gm-Message-State: ALQs6tCVS2GUyP0Q2thXowjl1pUbIrAc4ynpLvzeYg4RVygLYUmel+6v tZEEFoFpphg90nsdws4geicHzvP6myVHNtyUcj4= X-Google-Smtp-Source: AIpwx4/D/xIItlVsrjgdX+m6NluZN3lIWeV2NaJlyFSDOg6xqPyU+/v9A/T6BIOaOx38b0v/OINgyW7XDK8cUCzN+B4= X-Received: by 2002:a19:b588:: with SMTP id g8-v6mr13199350lfk.90.1522916994267; Thu, 05 Apr 2018 01:29:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.46.136.199 with HTTP; Thu, 5 Apr 2018 01:29:13 -0700 (PDT) In-Reply-To: <1522909055223-0.post@n4.nabble.com> References: <1522411166597-0.post@n4.nabble.com> <1522745287926-0.post@n4.nabble.com> <3cf7bd64-b2c1-4354-1f53-d9eef4b481be@apache.org> <1522747688495-0.post@n4.nabble.com> <665c8edd-3568-12e4-1a04-61f707d143d3@apache.org> <1522749279771-0.post@n4.nabble.com> <01e2a639-3bd1-c146-34fd-17ca0ce4e7f0@apache.org> <1522835177448-0.post@n4.nabble.com> <1522909055223-0.post@n4.nabble.com> From: Fabian Hueske Date: Thu, 5 Apr 2018 10:29:13 +0200 Message-ID: Subject: Re: Task Manager fault tolerance does not work To: dhirajpraj Cc: user Content-Type: multipart/alternative; boundary="000000000000773e50056915bec9" --000000000000773e50056915bec9 Content-Type: text/plain; charset="UTF-8" Hi, Thanks for the feedback! As Till explained, the problem is that the JM first tries to schedule the job to the failed TM (which hasn't been detected as failed yet). The configured three restart attempts are "consumed" by these attempts and the job fails afterwards. Best, Fabian 2018-04-05 8:17 GMT+02:00 dhirajpraj : > Just for the record, > It did not work with RestartStrategies.fixedDelayRestart(3, 5000) but > worked > with RestartStrategies.fixedDelayRestart(20, 5000) > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ > --000000000000773e50056915bec9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Thanks for the feedback!
As Till explained, the problem is that the JM first tries to schedu= le the job to the failed TM (which hasn't been detected as failed yet).=
The configured three restart attempts are "consumed" by thes= e attempts and the job fails afterwards.

Best, Fabian

2018-04-05 8:17 = GMT+02:00 dhirajpraj <dhirajpraj@gmail.com>:
Just for the record,
It did not work with RestartStrategies.fixedDelayRestart(3, 5000) but = worked
with RestartStrategies.fixedDelayRestart(20, 5000)

--000000000000773e50056915bec9--