Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 255ED1733A for ; Thu, 5 Feb 2015 09:36:22 +0000 (UTC) Received: (qmail 54467 invoked by uid 500); 5 Feb 2015 09:36:22 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 54404 invoked by uid 500); 5 Feb 2015 09:36:22 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 54391 invoked by uid 99); 5 Feb 2015 09:36:21 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Feb 2015 09:36:21 +0000 Received: from uce.fritz.box (ip5b40324e.dynamic.kabel-deutschland.de [91.64.50.78]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 150BF1A0292 for ; Thu, 5 Feb 2015 09:36:20 +0000 (UTC) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Cluster execution - Jobmanager unreachable From: Ufuk Celebi In-Reply-To: <54D22379.7030108@fu-berlin.de> Date: Thu, 5 Feb 2015 10:36:17 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <54D22379.7030108@fu-berlin.de> To: dev@flink.apache.org X-Mailer: Apple Mail (2.1878.6) Hey Chesnay, I will look into it. Can you share the complete LOGs? =96 Ufuk On 04 Feb 2015, at 14:49, Chesnay Schepler = wrote: > Hello, >=20 > I'm trying to run python jobs with the latest master on a cluster and = get the following exception: >=20 > Error: The program execution failed: JobManager not reachable anymore. = Terminate waiting for job answer. > org.apache.flink.client.program.ProgramInvocationException: The = program execution failed: JobManager not reachable anymore. Terminate = waiting for job answer. > at org.apache.flink.client.program.Client.run(Client.java:345) > at org.apache.flink.client.program.Client.run(Client.java:304) > at org.apache.flink.client.program.Client.run(Client.java:298) > at = org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironm= ent.java:55) > at = org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironmen= t.java:677) > at = org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.runPlan(= PythonPlanBinder.java:106) > at = org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.main(Pyt= honPlanBinder.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at = org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedPro= gram.java:437) > at = org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForEx= ecution(PackagedProgram.java:353) > at org.apache.flink.client.program.Client.run(Client.java:250) > at = org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:387) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:356) > at = org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1066)= > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1090) >=20 > In the jobmanager log file i find this exception: >=20 > java.lang.IllegalStateException: Buffer has already been recycled. > at = org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Pr= econditions.java:176) > at = org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer= .java:131) > at = org.apache.flink.runtime.io.network.buffer.Buffer.setSize(Buffer.java:95) > at = org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerial= izer.getCurrentBuffer(SpanningRecordSerializer.java:151) > at = org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(R= ecordWriter.java:158) > at = org.apache.flink.runtime.operators.RegularPactTask.clearWriters(RegularPac= tTask.java:1533) > at = org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.= java:367) > at = org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironme= nt.java:204) > at java.lang.Thread.run(Thread.java:745) >=20 > the same exception is in the task manager logs, along with the = following one: >=20 > java.util.concurrent.TimeoutException: Futures timed out after [100 = seconds] > at = scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at = scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at = scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at = scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.sc= ala:53) > at scala.concurrent.Await$.result(package.scala:107) > at = org.apache.flink.runtime.akka.AkkaUtils$.ask(AkkaUtils.scala:265) > at org.apache.flink.runtime.akka.AkkaUtils.ask(AkkaUtils.scala) > at = org.apache.flink.runtime.io.network.partition.IntermediateResultPartition.= scheduleOrUpdateConsumers(IntermediateResultPartition.java:247) > at = org.apache.flink.runtime.io.network.partition.IntermediateResultPartition.= maybeNotifyConsumers(IntermediateResultPartition.java:240) > at = org.apache.flink.runtime.io.network.partition.IntermediateResultPartition.= add(IntermediateResultPartition.java:144) > at = org.apache.flink.runtime.io.network.api.writer.BufferWriter.writeBuffer(Bu= fferWriter.java:74) > at = org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWri= ter.java:91) > at = org.apache.flink.runtime.operators.shipping.OutputCollector.collect(Output= Collector.java:88) > at = org.apache.flink.languagebinding.api.java.common.streaming.Receiver.collec= tBuffer(Receiver.java:253) > at = org.apache.flink.languagebinding.api.java.common.streaming.Streamer.stream= BufferWithoutGroups(Streamer.java:193) > at = org.apache.flink.languagebinding.api.java.python.functions.PythonMapPartit= ion.mapPartition(PythonMapPartition.java:54) > at = org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriv= er.java:98) > at = org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.jav= a:496) > at = org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.= java:360) > at = org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironme= nt.java:204) > at java.lang.Thread.run(Thread.java:745) >=20