flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanan Meyer <ha...@scalabill.it>
Subject Re: Apache Flink:ProgramInvocationException on Yarn
Date Sun, 30 Aug 2015 08:12:15 GMT
Hello.
Let me clarify the situation.
1. We are using flink 0.9.0 for Hadoop 2.7. We connected it to HDFS 2.7.1.
2. Locally, our program is working: once we run flink as ./start-local.sh,
we are able to connect and run the createRemoteEnvironment and Execute
methods.
3.Due to our architecture and basic Flink feature we want to invoke this
functionality REMOTELY , when our Java code is calling the Flink methods
from another server.
4.We tried both ExecutionEnvironment.createRemoteEnvironment("1.2.3.1",
6123, "TestProj.jar"); and ExecutionEnvironment.createRemoteEnvironment("
flink@1.2.3.1", 6123, "TestProj.jar"); (which is definitely not right since
it should be an IP address) - it crash on the "cant reach JobManager" error.

It seems to us that it can be  one of 2 issues.
1.Somehow we need to configure flink to accept the connections from the
remote machine
2.Flink has a critical showstopper bug that jeopardizing a whole decision
to use this technology.

Please advise us how we should advance.




On Fri, Aug 28, 2015 at 10:27 AM, Robert Metzger <rmetzger@apache.org>
wrote:

> Hi,
>
> in the exception you've posted earlier, you can see the following root
> cause:
>
> Caused by: akka.actor.ActorNotFound: Actor not found for:
> ActorSelection[Anchor(akka.tcp://flink@FLINK_SERVER_URL:6123/),
> Path(/user/jobmanager)]
>
> This string "akka.tcp://flink@FLINK_SERVER_URL:6123/" usually looks like
> this: "akka.tcp://flink@1.2.3.4:6123/". So it seems that you are
> passing FLINK_SERVER_URL
> as the server hostname (or ip).
> Can you pass the correct hostname when you call ExecutionEnvironment.
> createRemoteEnvironment().
>
> On Fri, Aug 28, 2015 at 7:52 AM, Hanan Meyer <hanan@scalabill.it> wrote:
>
> > Hi
> >  I'm currently using flink 0.9.0 which by maven support Hadoop 1 .
> > By using flink-clients-0.7.0-hadoop2-incubating.jar with executePlan(Plan
> > p) method  instead, I'm getting the same exception
> >
> > Hanan
> >
> > On Fri, Aug 28, 2015 at 8:35 AM, Hanan Meyer <hanan@scalabill.it> wrote:
> >
> > >
> > > Hi
> > >
> > > 1. I have restarted Flink service via stop/start-loval.sh - it have
> been
> > > restarted successfully ,no errors in log folder
> > > 2. default flink port is -6123
> > >
> > > Getting this via Eclips IDE:
> > >
> > > Thanks
> > >
> > >
> > > org.apache.flink.client.program.ProgramInvocationException: Failed to
> > > resolve JobManager
> > > at org.apache.flink.client.program.Client.run(Client.java:379)
> > > at org.apache.flink.client.program.Client.run(Client.java:356)
> > > at org.apache.flink.client.program.Client.run(Client.java:349)
> > > at
> > >
> >
> org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:89)
> > > at
> > >
> >
> org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:82)
> > > at
> > >
> >
> org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:71)
> > > at
> > >
> >
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
> > > at Test.main(Test.java:39)
> > > Caused by: java.io.IOException: JobManager at
> > > akka.tcp://flink@FLINK_SERVER_URL:6123/user/jobmanager not reachable.
> > > Please make sure that the JobManager is running and its port is
> > reachable.
> > > at
> > >
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:1197)
> > > at
> > >
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:1221)
> > > at
> > >
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:1239)
> > > at
> > >
> >
> org.apache.flink.runtime.jobmanager.JobManager.getJobManagerRemoteReference(JobManager.scala)
> > > at org.apache.flink.client.program.Client.run(Client.java:376)
> > > ... 7 more
> > > Caused by: akka.actor.ActorNotFound: Actor not found for:
> > > ActorSelection[Anchor(akka.tcp://flink@FLINK_SERVER_URL:6123/),
> > > Path(/user/jobmanager)]
> > > at
> > >
> >
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> > > at
> > >
> >
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> > > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > > at akka.dispatch.BatchingExecutor$
> > > Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> > > at
> > >
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> > > at
> > >
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> > > at
> > >
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> > > at
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> > > at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> > > at
> > >
> >
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> > > at
> > akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> > > at
> > >
> >
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> > > at
> > >
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> > > at
> > >
> >
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
> > > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
> > > at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
> > > at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
> > > at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
> > > at
> > >
> >
> akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
> > > at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
> > > at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
> > > at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
> > > at
> > >
> >
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
> > > at
> > >
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
> > > at akka.actor.ActorCell.terminate(ActorCell.scala:369)
> > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
> > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> > > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
> > > at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> > > at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > > at
> > >
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> > > at
> > >
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> > > at
> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > > at
> > >
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > >
> > >
> > > On Thu, Aug 27, 2015 at 10:47 PM, Robert Metzger <rmetzger@apache.org>
> > > wrote:
> > >
> > >> I guess you are getting an entire exception after the
> "org.apache.flink
> > >> .client.program.ProgramInvocationException: Failed to
> > >> resolve JobManager".
> > >> Can you post it here to help us understanding the issue?
> > >>
> > >> On Thu, Aug 27, 2015 at 6:55 PM, Alexey Sapozhnikov <
> > alexey@scalabill.it>
> > >> wrote:
> > >>
> > >> > Hello all.
> > >> >
> > >> > Some clarification: locally everything works great.
> > >> > However once we run our Flink on remote linux machine and try to run
> > the
> > >> > client program from our machine, using create remote environment-
> > Flink
> > >> > JobManager is raising this exception
> > >> >
> > >> > On Thu, Aug 27, 2015 at 7:41 PM, Stephan Ewen <sewen@apache.org>
> > wrote:
> > >> >
> > >> > > If you start the job via the "bin/flink" script, then simply
use
> > >> > > "ExecutionEnvironment.getExecutionEnvironment()" rather then
> > creating
> > >> a
> > >> > > remote environment manually.
> > >> > >
> > >> > > That way, hosts and ports are configured automatically.
> > >> > >
> > >> > > On Thu, Aug 27, 2015 at 6:39 PM, Robert Metzger <
> > rmetzger@apache.org>
> > >> > > wrote:
> > >> > >
> > >> > >> Hi,
> > >> > >>
> > >> > >> Which values did you use for FLINK_SERVER_URL and FLINK_PORT?
> > >> > >> Every time you deploy Flink on YARN, the host and port change,
> > >> because
> > >> > the
> > >> > >> JobManager is started on a different YARN container.
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Aug 27, 2015 at 6:32 PM, Hanan Meyer <hanan@scalabill.it
> >
> > >> > wrote:
> > >> > >>
> > >> > >> > Hello All
> > >> > >> >
> > >> > >> > When using Eclipse IDE to submit Flink to Yarn single
node
> > cluster
> > >> I'm
> > >> > >> > getting :
> > >> > >> > "org.apache.flink.client.program.ProgramInvocationException:
> > >> Failed to
> > >> > >> > resolve JobManager"
> > >> > >> >
> > >> > >> > Using Flink 0.9.0
> > >> > >> >
> > >> > >> > The Jar copy a file from one location in Hdfs to another
and
> > works
> > >> > fine
> > >> > >> > while executed locally on the single node Yarn cluster
-
> > >> > >> > bin/flink run -c Test ./examples/MyJar.jar
> > >> > >> > hdfs://localhost:9000/flink/in.txt
> > >> hdfs://localhost:9000/flink/out.txt
> > >> > >> >
> > >> > >> > The code skeleton:
> > >> > >> >
> > >> > >> >     ExecutionEnvironment envRemote =
> > >> > >> > ExecutionEnvironment.createRemoteEnvironment
> > >> > >> > (FLINK_SERVER_URL,FLINK PORT,JAR_PATH_ON_CLIENT);
> > >> > >> > DataSet<String> data =
> > >> > >> > envRemote.readTextFile("hdfs://localhost:9000/flink/in.txt");
> > >> > >> > data.writeAsText("hdfs://localhost:9000/flink/out.txt");
> > >> > >> > envRemote.execute();
> > >> > >> >
> > >> > >> >
> > >> > >> > Please advise,
> > >> > >> >
> > >> > >> > Hanan Meyer
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message