Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1BD54182C5 for ; Sat, 20 Feb 2016 21:56:12 +0000 (UTC) Received: (qmail 27419 invoked by uid 500); 20 Feb 2016 21:56:07 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 27297 invoked by uid 500); 20 Feb 2016 21:56:06 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 27280 invoked by uid 99); 20 Feb 2016 21:56:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Feb 2016 21:56:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 733A6C0B70 for ; Sat, 20 Feb 2016 21:56:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.18 X-Spam-Level: * X-Spam-Status: No, score=1.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id fHq-ekQnzw9a for ; Sat, 20 Feb 2016 21:56:04 +0000 (UTC) Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com [209.85.217.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 17C6C5F399 for ; Sat, 20 Feb 2016 21:56:04 +0000 (UTC) Received: by mail-lb0-f181.google.com with SMTP id x1so64460477lbj.3 for ; Sat, 20 Feb 2016 13:56:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=EzeJ+5WPvTaLqWPFI0rKdv+eso2vAVKBqMDKd1j4PvM=; b=bJVovRdUr+k4sAy6xJGiMkbW8tWkLAvaTRaXh3V7XUmNGnvuwfb8AI+3Xoo4R7UweL UUlQLn7M+Cl0jaEtM3qWX/QPOQuLpWM2Xwopjwx3iRDqVVxXNdEJNbUyPnvCO5zqWmAY Jcw0QTlH8E+0CEMTzZkQbsnQVOIAyr6LEzFUP0ZRbsSzx9s7gmCWuhCcykP2apoVcGP0 ngrMoPAH52rw5da7r70QgTPKfdLz0dkKyOUTXnCruA1TieIE545Ncj7FUmGlBrN3Tigr eZM/Cym1Qf9uVUIzlFbqdi67fGT7X3qQo/HfYmiQqpqrLI6WmYVNpiof7VGQtOQ4pAm3 /7MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=EzeJ+5WPvTaLqWPFI0rKdv+eso2vAVKBqMDKd1j4PvM=; b=TuiUBqhrqbEOkExbc0F96ZxY+HMUuodOV4LGpSvJzovMYrvX8iNcHWL9DmMrIhS9qp DFJzhTFwpw/J/zHsos0n00uR8bRZ3uIKXDOFzt89R4MJRwj0pnT6rt2w3k31EQMnuFio 6KJSLhSKlfkMtlB6ouYF8L1C0NKqe9A4EtMqr5ChQ+qSHq0+o3+Ka6+5QkWRi3gioQFb 9EEfTx+MiBvAi8koT/F3zHpvscZetdMpTcWQENk6XpgGeCovGdsrKxVI5KI1HQRQ/p9S L/SJp5kQnPkqoou0ZkQ5yVockX4SF6StMl844sDxnc28rboEwsUE6tAPAYZkN5lQdXpW T7Uw== X-Gm-Message-State: AG10YOQjQ+ahexw+6aXBprDC/2dNWJ6zF8S7O5ns59kUZ0EXw04BrPXQC5bUsPE5pun3nun++PPHcIpUNQPPPg== MIME-Version: 1.0 X-Received: by 10.112.17.70 with SMTP id m6mr7781469lbd.130.1456005356815; Sat, 20 Feb 2016 13:55:56 -0800 (PST) Received: by 10.25.210.130 with HTTP; Sat, 20 Feb 2016 13:55:56 -0800 (PST) In-Reply-To: References: Date: Sat, 20 Feb 2016 21:55:56 +0000 Message-ID: Subject: Re: How to increase akka heartbeat? From: Saiph Kappa To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a11c3d3e2158fc9052c3aaa79 --001a11c3d3e2158fc9052c3aaa79 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for your help. Apparently the problem was not in Akka. It seems that when using a source .socketTextStream with maxRetry =3D -1, it continually attempts to connect to the socket for the 1st time, but once it is connected, and if no data is sent, it seems that the job is terminated. On Fri, Feb 19, 2016 at 9:13 PM, Stephan Ewen wrote: > Hi Saiph! > > What is the problem that is happening? The log actually looks like the Jo= b > is successfully sent to the JobManager. > > Stephan > > > > On Fri, Feb 19, 2016 at 8:49 PM, Robert Metzger > wrote: > >> Hi, >> can you maybe (if you want also private) send me the full logs of the >> jobmanager? The messages you've posted here are logged at DEBUG level. T= hey >> don't indicate an erroneous behavior of the system. >> >> On Fri, Feb 19, 2016 at 8:44 PM, Saiph Kappa >> wrote: >> >>> These were the parameters that I set btw: >>> >>> akka.watch.heartbeat.interval: 100 >>> akka.transport.heartbeat.interval: 1000 >>> >>> On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa >>> wrote: >>> >>>> I am not sure. >>>> >>>> For that particular machine I get messages like these: >>>> =C2=AB >>>> myip:6123/user/jobmanager#291801197])) at akka://flink/user/$a from >>>> Actor[akka://flink/deadLetters]. >>>> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Connected to new >>>> JobManager akka.tcp://flink@myip:6123/user/jobmanager. >>>> >>>> ^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor - Sending message to >>>> JobManager akka.tcp://flink@myip:6123/user/jobmanager to submit job >>>> JOB1 (5f9cef0c2e4b69530bf1e2485e94d326) and wait for progress >>>> >>>> >>>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message >>>> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@my= ip:6123/user/jobmanager#291801197])) >>>> in 48 ms from Actor[akka://flink/deadLetters]. >>>> >>>> >>>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Handled message >>>> LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp://flink@my= ip:6123/user/jobmanager#291801197])) >>>> in 48 ms from Actor[akka://flink/deadLetters]. >>>> >>>> ^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor - Received message >>>> JobSubmitSuccess(2575d5ff5c10336beb7820a052a63623) at akka://flink/use= r/$a >>>> from Actor[akka.tcp://flink@myip:6123/user/jobmanager#1144818256]. >>>> =C2=BB >>>> >>>> I tried to set the heartbeat interval in the cluster but it didn't >>>> solve the problem, should I try to set it in the client (how can I do = it)? >>>> I see no other errors or exceptions on the log files. >>>> >>>> >>>> >>>> >>>> On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger >>>> wrote: >>>> >>>>> Hi Saiph, >>>>> >>>>> are you sure that the jobs are cancelled because the client >>>>> disconnects? >>>>> >>>>> For the different timeouts, check the configuration page: >>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/co= nfig.html >>>>> and search for "heartbeat". >>>>> >>>>> On Fri, Feb 19, 2016 at 8:04 PM, Saiph Kappa >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have a Flink client application that launches jobs to remote >>>>>> clusters. However I'm getting my jobs cancelled: >>>>>> "18:25:29,650 WARN >>>>>> akka.remote.ReliableDeliverySupervisor - Asso= ciation >>>>>> with remote system [akka.tcp://flink@127.0.0.1:52929] has failed, >>>>>> address is now gated for [5000] ms. Reason is: [Disassociated]." >>>>>> >>>>>> How can I increase the akka heartbeat interval? Where should I set >>>>>> that configuration parameter, in the client or in the Flink clusters= , and >>>>>> in which file. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>>> >>>> >>> >> > --001a11c3d3e2158fc9052c3aaa79 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for your help. Apparently the problem was not in Ak= ka. It seems that when using a source .socketTextStream with maxRetry =3D -= 1, it continually attempts to connect to the socket for the 1st time, but o= nce it is connected, and if no data is sent, it seems that the job is termi= nated.

O= n Fri, Feb 19, 2016 at 9:13 PM, Stephan Ewen <sewen@apache.org> wrote:
Hi Saiph!
What is the problem that is happening? The log actually loo= ks like the Job is successfully sent to the JobManager.

Stephan

=


On Fri, F= eb 19, 2016 at 8:49 PM, Robert Metzger <rmetzger@apache.org> wrote:
Hi,
can yo= u maybe (if you want also private) send me the full logs of the jobmanager?= The messages you've posted here are logged at DEBUG level. They don= 9;t indicate an erroneous behavior of the system.

On Fri, Feb 19, 2016 = at 8:44 PM, Saiph Kappa <saiph.kappa@gmail.com> wrote:
These were the parameters= that I set btw:

akka.watch.heartbeat.interval: 100
akka.transpo= rt.heartbeat.interval: 1000
<= br>
On Fri, Feb 19, 2016 at 7:43 PM, Saiph Kappa = <saiph.kappa@gmail.com> wrote:
I am not sure.

For that p= articular machine I get messages like these:
=C2=AB
myip:6123/user/jo= bmanager#291801197])) at akka://flink/user/$a from Actor[akka://flink/deadL= etters].
^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0= - Connected to new JobManager akka.tcp://flink@myip:6123/user/jobmanager.<= br>
^[[34m[INFO]^[[0;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - Se= nding message to JobManager akka.tcp://flink@myip:6123/user/jobmanager to s= ubmit job JOB1 (5f9cef0c2e4b69530bf1e2485e94d326) and wait for progress
=

^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - = Handled message LeaderSessionMessage(null,JobManagerActorRef(Actor[akka.tcp= ://flink@myip:6123/user/jobmanager#291801197])) in 48 ms from Actor[akka://= flink/deadLetters].


^[[39m[DEBUG]^[[0;39m o.a.f.r.c.JobClientAct= or=C2=A0=C2=A0=C2=A0 - Handled message LeaderSessionMessage(null,JobManager= ActorRef(Actor[akka.tcp://flink@myip:6123/user/jobmanager#291801197])) in 4= 8 ms from Actor[akka://flink/deadLetters].

^[[39m[DEBUG]^[[0;39m o.a= .f.r.c.JobClientActor=C2=A0=C2=A0=C2=A0 - Received message JobSubmitSuccess= (2575d5ff5c10336beb7820a052a63623) at akka://flink/user/$a from Actor[akka.= tcp://flink@myip:6123/user/jobmanager#1144818256].
=C2=BB

<= div>I tried to set the heartbeat interval in the cluster but it didn't = solve the problem, should I try to set it in the client (how can I do it)? = I see no other errors or exceptions on the log files.



On Fri, Feb 19, 2016 at 7:07 PM, Robert Metzger <rm= etzger@apache.org> wrote:
<= div dir=3D"ltr">Hi Saiph,

are you sure that the jobs are= cancelled because the client disconnects?

For the= different timeouts, check the configuration page:=C2=A0https://ci.apache.org/projects/flink/flink-docs-release-0.10= /setup/config.html and search for "heartbeat".

On Fri, Fe= b 19, 2016 at 8:04 PM, Saiph Kappa <saiph.kappa@gmail.com> wrote:
Hi,<= br>
I have a Flink client application that launches jobs to remote= clusters. However I'm getting my jobs cancelled:
"18:25:29,650= WARN=C2=A0 akka.remote.ReliableDeliverySupervisor=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - Association with remote system [a= kka.tcp://flink@= 127.0.0.1:52929] has failed, address is now gated for [5000] ms. Reason= is: [Disassociated]."

How can I increase the akka h= eartbeat interval? Where should I set that configuration parameter, in the = client or in the Flink clusters, and in which file.

Thanks.






--001a11c3d3e2158fc9052c3aaa79--