Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7854D200CFE for ; Fri, 8 Sep 2017 17:39:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 77652160CC7; Fri, 8 Sep 2017 15:39:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 976E1160CB2 for ; Fri, 8 Sep 2017 17:39:05 +0200 (CEST) Received: (qmail 84725 invoked by uid 500); 8 Sep 2017 15:39:04 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 84715 invoked by uid 99); 8 Sep 2017 15:39:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Sep 2017 15:39:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A77F218FE6F for ; Fri, 8 Sep 2017 15:39:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.692 X-Spam-Level: *** X-Spam-Status: No, score=3.692 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id V6OENwGUKtou for ; Fri, 8 Sep 2017 15:39:02 +0000 (UTC) Received: from mail-vk0-f46.google.com (mail-vk0-f46.google.com [209.85.213.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B39D160CE2 for ; Fri, 8 Sep 2017 15:39:01 +0000 (UTC) Received: by mail-vk0-f46.google.com with SMTP id v203so3817257vkv.3 for ; Fri, 08 Sep 2017 08:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=7hP8bDzSTLNZWBSZob3sFQgedU6dL/ANc7v1w+Khpg8=; b=fi96JECE1AQIhKxd0BgEZamohd9W8lN1O4mHMTyro2eGh6vsKhL/Zd4HleFeFfigJX IoxNu6zaGhTLThMx/8O2hPZuQCL3XAeGH+FfUZcnOkCVF2Pkjplc4XxJq6l1VkPHhXMI NniHQE7XmMpvd6/1VKdytmmu2R9erjyxNKkCueab7gteHTNVV90mKVaAd2d0MjJnWI5r hkKqgWkjxxZ2u8+1aQviARNSliklslpzKP7rKWd+43wOVTiQcE6SU99IK+zNLycUfzn0 5OWdfGaQTjOhpoP6ZVbqbtXVd00S0Io6tryhPFhyqh8zEV2VM/6y+rwFfxnZa/WZC3Wy repA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=7hP8bDzSTLNZWBSZob3sFQgedU6dL/ANc7v1w+Khpg8=; b=EifjVkWsbOTKpaHw6BSfH3wVt1+yp6TyqhbWlHIGKgCojYc+r7OT7Kipc1vdBNRRnO o+1TGYuHvLmjzIYkagiwdI24P13nlHstZHVM77dTlxGZS5Co9cxIEp3Rw7/tgMgVRNmk 27Uw8xshISHX4GrSkfHJ1C2GOLjJ1nPNBzDNiYWi+yWi1sXkK/lPbtj3Xvn39U5ZskvP c/hEBDfmdC18JRINTPRIjdveIpyWxWT9qNTvrob/q0Ee9k1WsLqCiAMnOBjjSCCOL2Vn yHMIZ/TzjoQCcw2OdojtbgPbf6eEG0Int+3VimvMeo+S2Z//160h0BJ4d4OhOYypKZzy DTVw== X-Gm-Message-State: AHPjjUhYmYiIri2zHFjoElWvHpYFjPob7wtpK+eC1WxpDQGKWx4lX92N XYIftdDskvD1tXwpWe7FQ8qTXnbVNA== X-Google-Smtp-Source: AOwi7QAxKuAYqU8KNqVs52xqUYRSmeniSOmYJKuFeafhXEF6Bs8yVz4md0KQmBzsyPNFMu/206IbXCS5906/LNs//k4= X-Received: by 10.31.234.1 with SMTP id i1mr1825314vkh.192.1504885135636; Fri, 08 Sep 2017 08:38:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.56.201 with HTTP; Fri, 8 Sep 2017 08:38:25 -0700 (PDT) In-Reply-To: <1504668939165-0.post@n4.nabble.com> References: <1504668939165-0.post@n4.nabble.com> From: Fabian Hueske Date: Fri, 8 Sep 2017 17:38:25 +0200 Message-ID: Subject: Re: Fwd: HA : My job didn't restart even if task manager restarted. To: sunny yun Cc: user , Till Rohrmann Content-Type: multipart/alternative; boundary="94eb2c091ec0f009f40558af5f43" archived-at: Fri, 08 Sep 2017 15:39:06 -0000 --94eb2c091ec0f009f40558af5f43 Content-Type: text/plain; charset="UTF-8" Hi, sorry for the late response! I'm not familiar with the details of the failure recovery but Till (in CC) knows the code in depth. Maybe he can figure out what's going on. Best, Fabian 2017-09-06 5:35 GMT+02:00 sunny yun : > I am still struggling to solve this problem. > I have no doubt that the JOB should automatically restart after restarting > the TASK MANAGER in YARN MODE. Is it a misunderstood? > > Problem seems that *JOB MANAGER still try to connect to old TASK MANAGER > even after new TASK MANAGER container be created.* > When I killed TM on node#2 then new TM container is created on node#3, but > JM still tries to connect to TM on node#2 according to the log file. (It > was > not a log I posted before, when I found it while continuing the test. > Normally the TM be created on the same node after killed.) > So new TM don't know JOB info and JM show us JOB with fail status. > > If anyone has succeeded in the same situation(YARN + TM FAILURE), please > just tell me. > That will be big help to me. > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ > --94eb2c091ec0f009f40558af5f43 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

sorry for the late respons= e!
I'm not familiar with the details of the failure recovery b= ut Till (in CC) knows the code in depth.
Maybe he can figure out = what's going on.

Best, Fabian

2017-09-06 5:35 GMT+02:00= sunny yun <seonhee.yun@gmail.com>:
I am still struggling to solve this problem.
I have no doubt that the JOB should automatically restart after restarting<= br> the TASK MANAGER in YARN MODE. Is it a misunderstood?

Problem seems that *JOB MANAGER still try to connect to old TASK MANAGER even after new TASK MANAGER container be created.*
When I killed TM on node#2 then new TM container is created on node#3, but<= br> JM still tries to connect to TM on node#2 according to the log file. (It wa= s
not a log I posted before, when I found it while continuing the test.
Normally the TM be created on the same node after killed.)
So new TM don't know JOB info and JM show us JOB with fail status.

If anyone has succeeded in the same situation(YARN + TM FAILURE), please just tell me.
That will be big help to me.



--
Sent from: http://apache-flink-u= ser-mailing-list-archive.2336050.n4.nabble.com/

--94eb2c091ec0f009f40558af5f43--