flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Could not build up connection to JobManager
Date Wed, 25 Feb 2015 19:16:59 GMT
Addition: To check whether a port is reachable, I think the easiest thing
is to try and connect with a telnet client and see if the connection is
refused.

On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <sewen@apache.org> wrote:

> Okay, the problem seems to be that even though both the client and the
> jobmanager use "localhost" as the host name, they resolve this to different
> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
>
>  Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
> apparently.
>
> Can you help us debug this by checking the following:
>
>  - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
> that solves it?
>  - Can you try and set "jobmanager.rpc.address" to the other address (10.216.177.146
> or so) and see if that solves it?
>  - Can you do "start-cluster.sh", rather than "start-local.sh" and see
> whether the webfrontend displays that the TaskManager connects?
>  - As a hard core test: Can you bring up the jobmanager, check where it
> connects (10.216.192.98:6123 or so) and see whether the port is reachable?
>
> We have recently updated how the Akka URLs are build, to work around a
> limitation in Akka. Seems that did not yet fully solve the issue.
>
> Thanks for helping us debug this, it is not the easiest immigration
> experience, but the outcome is probably extremely valuable for the project
> :-)
>
> Greetings,
> Stephan
>
>
> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <vidura.me@icloud.com>
> wrote:
>
>> Hi,
>> Sorry for the delay to reply on this issue.
>> the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
>> This can’t be an issue because the job manager web interface works fine
>> which also runs on localhost
>>
>>  bin/flink run <jar> doesn’t seem to work either. Let me send you my
>> command and the result in terminal.
>>
>> bin/flink run
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>> $FLINK_DIRECTORY/count
>>
>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
>>        - Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> org.apache.flink.client.program.ProgramInvocationException: Could not
>> build up connection to JobManager.
>>         at org.apache.flink.client.program.Client.run(Client.java:327)
>>         at org.apache.flink.client.program.Client.run(Client.java:306)
>>         at org.apache.flink.client.program.Client.run(Client.java:300)
>>         at
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>         at
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:483)
>>         at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>         at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>         at org.apache.flink.client.program.Client.run(Client.java:250)
>>         at
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>         at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>> Caused by: java.io.IOException: JobManager at akka.tcp://
>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make
>> sure that the JobManager is running and its port is reachable.
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
>>         at
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>         at
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>         at
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>         at
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>         at org.apache.flink.client.program.Client.run(Client.java:322)
>>         ... 15 more
>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
>> [10000 milliseconds]
>>         at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>         at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>         at
>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>         at
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>         at scala.concurrent.Await$.result(package.scala:107)
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
>>         ... 20 more
>>
>> The exception above occurred while trying to run your command.
>>
>>
>> > On Feb 25, 2015, at 1:29 AM, Stephan Ewen <sewen@apache.org> wrote:
>> >
>> > BTW: Does still work if you enter "localhost" for
>> "jobmanager.rpc.address"
>> > in your flink-conf.yaml ?
>> >
>> > On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <sewen@apache.org> wrote:
>> >
>> >> Hi!
>> >>
>> >> I think that this is a problem in the current master (probably in there
>> >> since a few days ago). I am fixing it...
>> >>
>> >> Thanks for reporting it!
>> >>
>> >> Stephan
>> >>
>> >>
>> >> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <sewen@apache.org>
>> wrote:
>> >>
>> >>> Hi Dulaj!
>> >>>
>> >>> The log suggests that the JobManager binds itself to the IP
>> >>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>> >>>
>> >>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>> >>>
>> >>> Let me verify whether this is a quirk of your particular setup, or a
>> bug
>> >>> recently introduces in the 0.9-SNAPSHOT.
>> >>>
>> >>> Does the command line work for you? ("bin/flink run <jar>")
>> >>>
>> >>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that
>> the
>> >>> default of '1' is used.
>> >>>
>> >>> Greetings,
>> >>> Stephan
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
>> vidura.me@icloud.com>
>> >>> wrote:
>> >>>
>> >>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>> >>>>
>> >>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <rmetzger@apache.org>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>> I could not find the logfiles attached to your mails. I think
the
>> >>>>> mailinglists are not accepting attachments.
>> >>>>> Can you put the logs on gist.github.com?
>> >>>>>
>> >>>>> The configuration values are documented here:
>> >>>>> http://flink.apache.org/docs/0.8/config.html
>> >>>>> For the webclient's port its called webclient.port
>> >>>>>
>> >>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <
>> vidura.me@icloud.com
>> >>>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> I tried to kill the job manager manually in the terminal
and start
>> it
>> >>>>>> again but no luck. Also could you tell me if it’s possible
to
>> change
>> >>>>>> webclient’s port (8080) ?
>> >>>>>>
>> >>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <sewen@apache.org>
>> wrote:
>> >>>>>>>
>> >>>>>>> Hey Dulaj!
>> >>>>>>>
>> >>>>>>> As a contributor, I would go against the latest version,
which is
>> >>>>>>> 0.9-SNAPSHOT.
>> >>>>>>>
>> >>>>>>> It may be in your case that the JobManager actor is
down, but the
>> >>>> process
>> >>>>>>> still lingers. (BTW: I have a patch pending that makes
sure the
>> >>>> process
>> >>>>>>> disappears when the actor via down).
>> >>>>>>>
>> >>>>>>> Could you have a look at the log
>> >>>> "flink-<user>-jobmanager-<host>-.log"
>> >>>>>> and
>> >>>>>>> see if there are any errors logged?
>> >>>>>>>
>> >>>>>>> Greetings,
>> >>>>>>> Stephan
>> >>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <
>> vidura.me@icloud.com
>> >>>>> :
>> >>>>>>>
>> >>>>>>>> The JobManager seems to run fine. I don't know.
When I tried to
>> run
>> >>>>>>>> start-local.sh again, It shows the PID of the running
JobManager
>> and
>> >>>>>> also
>> >>>>>>>> :8081 runs fine. I want to contribute to the project
and I could
>> >>>> get a
>> >>>>>>>> little boost if I could see the capabilities of
FLINK. :)
>> >>>>>>>> Will it be OK to use 0.8.1 as a developer?
>> >>>>>>>>
>> >>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <sewen@apache.org>
>> >>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi Dulaj,
>> >>>>>>>>
>> >>>>>>>> That error message indicates that the JobManager
is not running.
>> >>>> Are you
>> >>>>>>>> sure that the JobManager runs properly? Anything
in the
>> JobManager
>> >>>> logs?
>> >>>>>>>>
>> >>>>>>>> BTW: The 0.9 branch is under heavy development /
changes. That is
>> >>>> why it
>> >>>>>>>> may behave a bit different on different days right
now. I would
>> >>>>>> recommend
>> >>>>>>>> to use the 0.8.1 release for a stable experience.
>> >>>>>>>>
>> >>>>>>>> Greetings,
>> >>>>>>>> Stephan
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger
<
>> >>>> rmetzger@apache.org>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Thank you for the quick reply.
>> >>>>>>>>
>> >>>>>>>> The log you've send is from the webclient. Can you
also send the
>> >>>> log of
>> >>>>>> the
>> >>>>>>>>
>> >>>>>>>> JobManager?
>> >>>>>>>>
>> >>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga
<
>> >>>> vidura.me@icloud.com>
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Yes. It seams it is not a problem with the arguments.
I tried
>> two
>> >>>> days
>> >>>>>>>>
>> >>>>>>>> but
>> >>>>>>>>
>> >>>>>>>>> different error occurs. It seams the web client
can’t connect to
>> >>>> the
>> >>>>>> job
>> >>>>>>>>
>> >>>>>>>>> manager although it is running
>> >>>>>>>>
>> >>>>>>>>> Right now, I can’t even get the webclient
to run.
>> >>>>>>>>
>> >>>>>>>> ./bin/start-webclient.sh
>> >>>>>>>>
>> >>>>>>>>> executes fine but I cannot connect to localhost:8080
(even with
>> >>>> telnet
>> >>>>>> or
>> >>>>>>>>
>> >>>>>>>>> curl)
>> >>>>>>>>
>> >>>>>>>>> Here is the log for jobManager
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>>>>>
>> >>>>>>>>> - Setting up web frontend server, using web-root
directory
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> 'jar:
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>> >>>>>>>> '.
>> >>>>>>>>
>> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>>>>>
>> >>>>>>>>> - Web frontend server will store temporary files
in
>> >>>>>>>>
>> >>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T',
uploaded
>> jobs
>> >>>> in
>> >>>>>>>>
>> >>>>>>>>>
>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>> >>>>>>>>
>> >>>>>>>>> plan-json-dumps in
>> >>>>>>>>
>> >>>>>>>>>
>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>> >>>>>>>>
>> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>>>>>
>> >>>>>>>>> - Web-frontend will submit jobs to nephele job-manager
on
>> >>>>>>>>
>> >>>>>>>> localhost,
>> >>>>>>>>
>> >>>>>>>>> port 6123.
>> >>>>>>>>
>> >>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>> >>>>>>>>
>> >>>>>>>>> - Slf4jLogger started
>> >>>>>>>>
>> >>>>>>>>> 23:22:32,625 INFO Remoting
>> >>>>>>>>
>> >>>>>>>>> - Starting remoting
>> >>>>>>>>
>> >>>>>>>>> 23:22:32,838 INFO Remoting
>> >>>>>>>>
>> >>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>> >>>>>>>>
>> >>>>>>>>> 23:23:48,119 WARN Remoting
>> >>>>>>>>
>> >>>>>>>>> - Tried to associate with unreachable remote
address
>> [akka.tcp://
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> flink@10.218.98.169:6123]. Address is now gated
for 5000 ms,
>> all
>> >>>>>>>>
>> >>>>>>>> messages
>> >>>>>>>>
>> >>>>>>>>> to this address will be delivered to dead letters.
Reason:
>> >>>> Operation
>> >>>>>>>>
>> >>>>>>>> timed
>> >>>>>>>>
>> >>>>>>>>> out: /10.218.98.169:6123
>> >>>>>>>>
>> >>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>> >>>>>>>>
>> >>>>>>>>> - Unexpected exception: Could not find job manager
at specified
>> >>>>>>>>
>> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>> >>>>>>>>
>> >>>>>>>>> java.lang.RuntimeException: Could not find job
manager at
>> specified
>> >>>>>>>>
>> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>> >>>>>>>>
>> >>>>>>>>> at
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>> >>>>>>>>
>> >>>>>>>>> at
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>> >>>>>>>>
>> >>>>>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger
<
>> rmetzger@apache.org
>> >>>>>
>> >>>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>>>> you said in the other email thread that
the error only occurs
>> for
>> >>>>>>>>
>> >>>>>>>>>> Wordcount, not for Kmeans.
>> >>>>>>>>
>> >>>>>>>>>> Can you copy me the commands for both examples?
>> >>>>>>>>
>> >>>>>>>>>> I can not really believe that there is a
difference between the
>> >>>> two
>> >>>>>>>>
>> >>>>>>>> jobs.
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> Can you also send us the contents of the
jobmanager log file?
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>
>> >>>>>>>>>> Robert
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga
<
>> >>>>>> vidura.me@icloud.com
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>> I’m getting "Could not build up connection
to JobManager.”
>> When i
>> >>>>>>>>
>> >>>>>>>> tried
>> >>>>>>>>
>> >>>>>>>>> to
>> >>>>>>>>
>> >>>>>>>>>>> run the wordCount example. Can anyone
help?
>> >>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>> Dulaj
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message