Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E70B18DE5 for ; Fri, 19 Feb 2016 19:49:47 +0000 (UTC) Received: (qmail 95010 invoked by uid 500); 19 Feb 2016 19:49:47 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 94921 invoked by uid 500); 19 Feb 2016 19:49:47 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 94908 invoked by uid 99); 19 Feb 2016 19:49:47 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Feb 2016 19:49:47 +0000 Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 5CE011A0141 for ; Fri, 19 Feb 2016 19:49:46 +0000 (UTC) Received: by mail-lf0-f46.google.com with SMTP id l143so60756370lfe.2 for ; Fri, 19 Feb 2016 11:49:46 -0800 (PST) X-Gm-Message-State: AG10YORq99rYf0topLoRm5d2Ktr65viyLtnoSuUrLwGTatrUaNjRR2CAHZuSYu5+TOOSgDTaqOc52wVVNHMxSA== X-Received: by 10.25.146.145 with SMTP id u139mr6177532lfd.113.1455911384829; Fri, 19 Feb 2016 11:49:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.190.67 with HTTP; Fri, 19 Feb 2016 11:49:25 -0800 (PST) In-Reply-To: References: From: Robert Metzger Date: Fri, 19 Feb 2016 20:49:25 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: How to increase akka heartbeat? To: "user@flink.apache.org" Content-Type: multipart/alternative; boundary=001a1140207eead7ad052c24c854 --001a1140207eead7ad052c24c854 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, can you maybe (if you want also private) send me the full logs of the jobmanager? The messages you've posted here are logged at DEBUG level. They don't indicate an erroneous behavior of the system. On Fri, Feb 19, 2016 at 8:44 PM, Saiph Kappa wrote: > These were the parameters that I set btw: > > akka.watch.heartbeat.interval: 100 > akka.transport.heartbeat.interval: 1000 > > On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa > wrote: > >> I am not sure. >> >> For that particular machine I get messages like these: >> =C2=AB >> myip:6123/user/jobmanager#291801197])) at akka://flink/user/$a from >> Actor[akka://flink/deadLetters]. >> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Connected to new >> JobManager akka.tcp://flink@myip:6123/user/jobmanager. >> >> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Sending message to >> JobManager akka.tcp://flink@myip:6123/user/jobmanager to submit job JOB1 >> (5f9cef0c2e4b69530bf1e2485e94d326) and wait for progress >> >> >> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message >> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip= :6123/user/jobmanager#291801197])) >> in 48 ms from Actor[akka://flink/deadLetters]. >> >> >> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message >> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip= :6123/user/jobmanager#291801197])) >> in 48 ms from Actor[akka://flink/deadLetters]. >> >> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Received message >> JobSubmitSuccess(2575d5ff5c10336beb7820a052a63623) at akka://flink/user/= $a >> from Actor[akka.tcp://flink@myip:6123/user/jobmanager#1144818256]. >> =C2=BB >> >> I tried to set the heartbeat interval in the cluster but it didn't solve >> the problem, should I try to set it in the client (how can I do it)? I s= ee >> no other errors or exceptions on the log files. >> >> >> >> >> On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger >> wrote: >> >>> Hi Saiph, >>> >>> are you sure that the jobs are cancelled because the client disconnects= ? >>> >>> For the different timeouts, check the configuration page: >>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/conf= ig.html >>> and search for "heartbeat". >>> >>> On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa >>> wrote: >>> >>>> Hi, >>>> >>>> I have a Flink client application that launches jobs to remote >>>> clusters. However I'm getting my jobs cancelled: >>>> "18:25:29,650 WARN >>>> akka.remote.ReliableDeliverySupervisor - Associ= ation >>>> with remote system [akka.tcp://flink@127.0.0.1:52929] has failed, >>>> address is now gated for [5000] ms. Reason is: [Disassociated]." >>>> >>>> How can I increase the akka heartbeat interval? Where should I set tha= t >>>> configuration parameter, in the client or in the Flink clusters, and i= n >>>> which file. >>>> >>>> Thanks. >>>> >>>> >>> >> > --001a1140207eead7ad052c24c854 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,
can you maybe (if you want also private) send me t= he full logs of the jobmanager? The messages you've posted here are log= ged at DEBUG level. They don't indicate an erroneous behavior of the sy= stem.

= On Fri, Feb 19, 2016 at 8:44 PM, Saiph Kappa <saiph.kappa@gmail.com> wrote:
The= se were the parameters that I set btw:

akka.watch.heartbeat.interva= l: 100
akka.transport.heartbeat.interval: 1000

On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa <<= a href=3D"mailto:saiph.kappa@gmail.com" target=3D"_blank">saiph.kappa@gmail= .com> wrote:
I am not sure.

For that particular machine I get= messages like these:
=C2=AB
myip:6123/user/jobmanager#291801197])) a= t akka://flink/user/$a from Actor[akka://flink/deadLetters].
^[[34m[INFO= ]^[[0;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - Connected to new Job= Manager akka.tcp://flink@myip:6123/user/jobmanager.

^[[34m[INFO]^[[0= ;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - Sending message to JobMan= ager akka.tcp://flink@myip:6123/user/jobmanager to submit job JOB1 (5f9cef0= c2e4b69530bf1e2485e94d326) and wait for progress


^[[39m[DEBUG]^[= [0;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - Handled message LeaderS= essionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@myip:6123/user= /jobmanager#291801197])) in 48 ms from Actor[akka://flink/deadLetters].
=

^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - = Handled message LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp= ://flink@myip:6123/user/jobmanager#291801197])) in 48 ms from Actor[akka://= flink/deadLetters].

^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor= =C2=A0=C2=A0=C2=A0 - Received message JobSubmitSuccess(2575d5ff5c10336beb78= 20a052a63623) at akka://flink/user/$a from Actor[akka.tcp://flink@myip:6123= /user/jobmanager#1144818256].
=C2=BB

I tried to set th= e heartbeat interval in the cluster but it didn't solve the problem, sh= ould I try to set it in the client (how can I do it)? I see no other errors= or exceptions on the log files.




= On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger <rmetzger@apache.org= > wrote:
Hi Sa= iph,

are you sure that the jobs are cancelled because th= e client disconnects?


On Fri, Feb 19, 2016 at 8:04 PM= , Saiph Kappa <saiph.kappa@gmail.com> wrote:
Hi,

I have a= Flink client application that launches jobs to remote clusters. However I&= #39;m getting my jobs cancelled:
"18:25:29,650 WARN=C2=A0 akka.remo= te.ReliableDeliverySupervisor=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 - Association with remote system [akka.tcp://flink@127.0.0.1:52929] = has failed, address is now gated for [5000] ms. Reason is: [Disassociated].= "

How can I increase the akka heartbeat interval? Wh= ere should I set that configuration parameter, in the client or in the Flin= k clusters, and in which file.

Thanks.





--001a1140207eead7ad052c24c854--