flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Could not build up connection to JobManager
Date Mon, 16 Mar 2015 16:34:15 GMT
It is really strange. It's right that the CliFrontend now resolves
localhost to the correct local address 10.218.100.122. Moreover, according
to the logs, the JobManager is also started and binds to akka.tcp://
flink@10.218.100.122:6123. According to the logs, this is also the address
the CliFrontend uses to connect to the JobManager. If the timestamps are
correct, then the JobManager was still alive when the job was sent. I don't
really understand why this happens. Can it be that the CliFrontend which
binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that
you have some settings which prevent this? For the failing 127.0.0.1 case,
it would be helpful to have access to the JobManager log.

I've updated the branch
https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for
the "localhost" scenario. Could you try it out again? Thanks a lot for your
help.

Best regards,

Till

On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi <uce@apache.org> wrote:

> There was an issue for this:
> https://issues.apache.org/jira/browse/FLINK-1634
>
> Can we close it then?
>
> On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga <vidura.me@icloud.com>
> wrote:
>
> > Hay Stephan,
> > Great to know you could fix the issue. Thank you on the update.
> > Best regards.
> >
> > > On Mar 14, 2015, at 9:19 PM, Stephan Ewen <sewen@apache.org> wrote:
> > >
> > > Hey Dulaj!
> > >
> > > Forget what I said in the previous email. The issue with the wrong
> > address
> > > binding seems to be solved now. There is another issue that the
> embedded
> > > taskmanager does not start properly, for whatever reason. My gut
> feeling
> > is
> > > that there is something wrong
> > >
> > > There is a patch pending that changes the startup behavior to debug
> these
> > > situations much easier. I'll ping you as soon as that is in...
> > >
> > >
> > > Stephan
> > >
> > > On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <sewen@apache.org>
> wrote:
> > >
> > >> Hey Dulaj!
> > >>
> > >> One thing you can try is to add to the JVM startup options (in the
> > scripts
> > >> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and
> > see
> > >> if that helps it?
> > >>
> > >> Stephan
> > >>
> > >>
> > >> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>> Still this is no luck. I’ll upload the logs with configuration
> > >>> “localhost" as well as “127.0.0.1” so you can take a look.
> > >>>
> > >>> 127.0.0.1
> > >>> flink-Vidura-flink-client-localhost.log <
> > >>>
> >
> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
> > >>>>
> > >>>
> > >>> localhost
> > >>> flink-Vidura-flink-client-localhost.log <
> > >>>
> >
> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
> > >>>>
> > >>> flink-Vidura-jobmanager-localhost.log <
> > >>>
> >
> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
> > >>>>
> > >>>
> > >>>> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <trohrmann@apache.org>
> > >>> wrote:
> > >>>>
> > >>>> Hi Dulaj,
> > >>>>
> > >>>> sorry for my late response. It looks as if the JobClient tries
to
> > >>> connect
> > >>>> to the JobManager using its IPv6 instead of IPv4. Akka is really
> picky
> > >>> when
> > >>>> it comes to remote address. If Akka binds to the FQDN, then other
> > >>>> ActorSystem which try to connect to it using its IP address won't
be
> > >>>> successful. I assume that this might be a problem. I tried to fix
> it.
> > >>> You
> > >>>> can find it here [1]. Could you please try it out by starting a
> local
> > >>>> cluster with the start-local.sh script. If it fails, could you
> please
> > >>> send
> > >>>> me all log files (client, jobmanager and taskmanager). Once we
> figured
> > >>> out
> > >>>> why the JobCilent does not connect, we can try to tackle the
> > BlobServer
> > >>>> issue.
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>> Till
> > >>>>
> > >>>> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
> > >>>>
> > >>>> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <
> vidura.me@icloud.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>> The error message is,
> > >>>>>
> > >>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> > >>>>>      - Unable to load native-hadoop library for your platform...
> > using
> > >>>>> builtin-java classes where applicable
> > >>>>> org.apache.flink.client.program.ProgramInvocationException:
Could
> not
> > >>>>> build up connection to JobManager.
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:327)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:306)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:300)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> > >>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:250)
> > >>>>>       at
> > >>>>>
> > >>>
> > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> > >>>>>       at
> > org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> > >>>>>       at
> > >>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> > >>>>> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
> > >>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
> > >>>>> not reachable. Please make sure that the JobManager is running
and
> > its
> > >>> port
> > >>>>> is reachable.
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:322)
> > >>>>>       ... 15 more
> > >>>>> Caused by: akka.actor.ActorNotFound: Actor not found for:
> > >>>>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> > >>>>>       at
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> > >>>>>       at
> > >>>>>
> > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> > >>>>>       at
> > >>>>> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> > >>>>>       at
> > >>>>>
> > >>>
> > akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> > >>>>>       at
> > >>>>>
> > >>>
> > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
> > >>>>>       at
> > akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
> > >>>>>       ... 20 more
> > >>>>>
> > >>>>> The exception above occurred while trying to run your command.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Client log doesn’t seem to show any info,
> > >>>>>
> > >>>>>
> > >>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> > >>>>>      - Unable to load native-hadoop library for your platform...
> > using
> > >>>>> builtin-java classes where applicable
> > >>>>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
> > >>>>>     - The job has 0 registered types and 0 default Kryo serializers
> > >>>>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
> > >>>>>     - Slf4jLogger started
> > >>>>> 21:06:02,909 INFO  Remoting
> > >>>>>     - Starting remoting
> > >>>>> 21:06:03,158 INFO  Remoting
> > >>>>>     - Remoting started; listening on addresses :[akka.tcp://
> > >>>>> flink@127.0.0.1:49463]
> > >>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message