Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 91C4F200BBD for ; Tue, 8 Nov 2016 23:11:44 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 904C0160B0A; Tue, 8 Nov 2016 22:11:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 646AA160AD0 for ; Tue, 8 Nov 2016 23:11:43 +0100 (CET) Received: (qmail 74365 invoked by uid 500); 8 Nov 2016 22:11:42 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 74355 invoked by uid 99); 8 Nov 2016 22:11:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2016 22:11:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id BE268C03EC for ; Tue, 8 Nov 2016 22:11:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 9yet5gerUWpw for ; Tue, 8 Nov 2016 22:11:39 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 69A6B5F644 for ; Tue, 8 Nov 2016 22:11:39 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id u144so40064058wmu.1 for ; Tue, 08 Nov 2016 14:11:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=5Q9HWQ3eenEm/YtWC/ORvih0YRlcqqEYLgshdEvshAc=; b=roBhyEtaEGs+ZIXbQdXyBM9bYuCcEAalPKaZerwP8THcwIBKVATk+f4qq4tw9KPylg a7v+gwJRIynZusoI36PywmukGHvk4sCGWlSvl6QIhg9d8XtI7Ho0mxFuZnJz6YwDbhK8 3vLIK/OF51QyOP4xiWyEPiYbyEaiHGQT54eF5027JcS76PRK5mlSvalu6sLLCmZJY/S1 UF0vGjfupi6vQz69oNktiyd7th+ePhDo2PSZ1y0P91NoXt3i8ysQ2IHtQ2knPvt9QEEO /jAFKpUJZzh4nU6tkVBgOtYnM5VBXpZBo9rl5DEtUc56Z4x5CE9eBBYQmShgckP6r6sc jVQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=5Q9HWQ3eenEm/YtWC/ORvih0YRlcqqEYLgshdEvshAc=; b=S1WPs5Ns+64yuhyc/K3/YSlmP0GwFas5sRocN5XcEzS84L2mCQUF8RJVP8XjeUxmE0 wb97Dw9E+DYXIwOxnOJdgICS4mtGg+p09+pZHN3MNcHL7JVQ6X9yW1irSWJr9uqR2H93 yT6osUaU/yE9A0JY0GLoUKOibqDxd92WbNKNsWAErJ9ADOAkSFLfYvnsC0PbHVwOk1qR EZvDdOvNaDht7BuuUR0NF7sK2MP1cESQyAxN0MrtKDLe+4/RBOy/WRanKYqn9jy/XB7x CQ47t45lX4t1jCbtc97ZJr6nF2FlUnvoS/dnRBBzfvfO12I2T7vvBdB43YzzFO9n0KWq gpnw== X-Gm-Message-State: ABUngvcox5FF63yoA7h+0pCdv2Oi65jL8JuR03u1wDKlyL6fvjHCPtzskgqRLxnaXP1eF0Ca6lZ3yJ2wR1xrbg== X-Received: by 10.194.137.15 with SMTP id qe15mr14626125wjb.16.1478643098518; Tue, 08 Nov 2016 14:11:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.168.36 with HTTP; Tue, 8 Nov 2016 14:11:38 -0800 (PST) In-Reply-To: <1266388850.821413.1478637213061@mail.yahoo.com> References: <1266388850.821413.1478637213061.ref@mail.yahoo.com> <1266388850.821413.1478637213061@mail.yahoo.com> From: Till Rohrmann Date: Tue, 8 Nov 2016 23:11:38 +0100 Message-ID: Subject: Re: Why did the Flink Cluster JM crash? To: user@flink.apache.org, amir bahmanyari Content-Type: multipart/alternative; boundary=bcaec50e6159a308c00540d16c1f archived-at: Tue, 08 Nov 2016 22:11:44 -0000 --bcaec50e6159a308c00540d16c1f Content-Type: text/plain; charset=UTF-8 Hi Amir, what does the JM logs say? Cheers, Till On Tue, Nov 8, 2016 at 9:33 PM, amir bahmanyari wrote: > Hi colleagues, > I started the cluster all fine. Started the Beam app running in the Flink > Cluster fine. > Dashboard showed all tasks being consumed and open for business. > I started sending data to the Beam app, and all of the sudden the Flink JM > crashed. > Exceptions below. > Thanks+regards > Amir > > java.lang.RuntimeException: Pipeline execution failed > at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner. > java:113) > at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner. > java:48) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:183) > at benchmark.flinkspark.flink.BenchBeamRunners.main(BenchBeamRunners.java:622) > //p.run(); > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.flink.client.program.PackagedProgram.callMainMethod( > PackagedProgram.java:505) > at org.apache.flink.client.program.PackagedProgram. > invokeInteractiveModeForExecution(PackagedProgram.java:403) > at org.apache.flink.client.program.Client.runBlocking( > Client.java:248) > at org.apache.flink.client.CliFrontend.executeProgramBlocking( > CliFrontend.java:866) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333) > at org.apache.flink.client.CliFrontend.parseParameters( > CliFrontend.java:1189) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239) > Caused by: org.apache.flink.client.program.ProgramInvocationException: > The program execution failed: Communication with JobManager failed: Lost > connection to the JobManager. > at org.apache.flink.client.program.Client.runBlocking( > Client.java:381) > at org.apache.flink.client.program.Client.runBlocking( > Client.java:355) > at org.apache.flink.streaming.api.environment. > StreamContextEnvironment.execute(StreamContextEnvironment.java:65) > at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironm > ent.executePipeline(FlinkPipelineExecutionEnvironment.java:118) > at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner. > java:110) > ... 14 more > Caused by: org.apache.flink.runtime.client.JobExecutionException: > Communication with JobManager failed: Lost connection to the JobManager. > at org.apache.flink.runtime.client.JobClient. > submitJobAndWait(JobClient.java:140) > at org.apache.flink.client.program.Client.runBlocking( > Client.java:379) > ... 18 more > Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: > Lost connection to the JobManager. > at org.apache.flink.runtime.client.JobClientActor. > handleMessage(JobClientActor.java:244) > at org.apache.flink.runtime.akka.FlinkUntypedActor. > handleLeaderSessionID(FlinkUntypedActor.java:88) > at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive( > FlinkUntypedActor.java:68) > at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse( > UntypedActor.scala:167) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec( > ForkJoinTask.java:260) > at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. > pollAndExecAll(ForkJoinPool.java:1253) > at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. > runTask(ForkJoinPool.java:1346) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker( > ForkJoinPool.java:1979) > at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( > ForkJoinWorkerThread.java:107) > --bcaec50e6159a308c00540d16c1f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Amir,

what does the JM logs say?

Cheers,
Till

On Tue, Nov 8, 2016 at 9:33 PM, amir= bahmanyari <amirtousa@yahoo.com> wrote:
Hi colleagues,
I started the cluster all fine. Started the Be= am app running in the Flink Cluster fine.
Dashboard showed all tasks being= consumed and open for business.
I started sending data to the Beam app, a= nd all of the sudden the Flink JM crashed.
Exceptions below.
Thanks+rega= rds
Amir

java.lang.RuntimeException: Pipeline execution failed
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.beam.runners.flink.FlinkRunner.= run(FlinkRunner.java:113)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.b= eam.runners.flink.FlinkRunner.run(FlinkRunner.java:48)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.beam.sdk.Pipeline.run(Pipeline.java= :183)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at benchmark.flinkspark.flink.Bench= BeamRunners.main(BenchBeamRunners.java:622) =C2=A0//p.run();
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at sun.reflect.NativeMethodAccessorImpl.invo= ke0(Native Method)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.NativeM= ethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at sun.reflect.DelegatingMethodAccessorImp= l.invoke(DelegatingMethodAccessorImpl.java:43)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at java.lang.reflect.Method.invoke(Method.java:498)<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.client.program.Packag= edProgram.callMainMethod(PackagedProgram.java:505)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.flink.client.program.PackagedProgram.<= wbr>invokeInteractiveModeForExecution(PackagedProgram.java:403)
<= div id=3D"m_817411614676118975yui_3_16_0_ym19_1_1477610403404_831473">=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.client.program.Client.run= Blocking(Client.java:248)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.f= link.client.CliFrontend.executeProgramBlocking(CliFrontend.j= ava:866)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.client.CliFr= ontend.run(CliFrontend.java:333)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.a= pache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1= 189)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.client.CliFronte= nd.main(CliFrontend.java:1239)
Caused by: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Co= mmunication with JobManager failed: Lost connection to the JobManager.
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.client.program.Client.run= Blocking(Client.java:381)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.f= link.client.program.Client.runBlocking(Client.java:355)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:= 65)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineE= xecutionEnvironment.java:118)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apac= he.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:110)
=
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 ... 14 more
Caused by: org.apache.flink.runtime.<= wbr>client.JobExecutionException: Communication with JobManager failed: Los= t connection to the JobManager.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.= flink.runtime.client.JobClient.submitJobAndWait(JobClient.ja= va:140)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.client.progra= m.Client.runBlocking(Client.java:379)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ...= 18 more
Caused by: org.apache.flink.runtime.client.JobClientAct= orConnectionTimeoutException: Lost connection to the JobManager.
=
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.client.JobClientA= ctor.handleMessage(JobClientActor.java:244)
=C2=A0 =C2=A0 =C2=A0= =C2=A0 at org.apache.flink.runtime.akka.FlinkUntypedActor.handle= LeaderSessionID(FlinkUntypedActor.java:88)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(= FlinkUntypedActor.java:68)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.Untyp= edActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.Actor$class.aroundReceive(Ac= tor.scala:465)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at = akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at akka.actor.ActorCell.invoke(ActorCell.scala:48= 7)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.Mailbox.processMailbo= x(Mailbox.scala:254)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.Mai= lbox.run(Mailbox.scala:221)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispa= tch.Mailbox.exec(Mailbox.scala:231)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at sc= ala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.concurrent.forkjoin.ForkJoinPo= ol$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.= runTask(ForkJoinPool.java:1346)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at s= cala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java= :1979)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.concurrent.forkjoi= n.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

--bcaec50e6159a308c00540d16c1f--