flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dulaj Viduranga <vidura...@icloud.com>
Subject Re: Could not build up connection to JobManager
Date Thu, 26 Feb 2015 17:45:15 GMT
Hi,
	Can you tell me where I can find TaskManager logs. I can’t find them in logs folder? I
don’t suppose I should run taskmanager.sh as well. Right?
	I’m using a OS X Yosemite. I’ll send you my ifconfig.

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
	options=3<RXCSUM,TXCSUM>
	inet6 ::1 prefixlen 128 
	inet 127.0.0.1 netmask 0xff000000 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
	nd6 options=1<PERFORMNUD>
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
	ether 60:03:08:a1:e0:f4 
	nd6 options=1<PERFORMNUD>
	media: autoselect (<unknown type>)
	status: inactive
en1: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
	options=60<TSO4,TSO6>
	ether 72:00:02:32:14:d0 
	media: autoselect <full-duplex>
	status: inactive
en2: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
	options=60<TSO4,TSO6>
	ether 72:00:02:32:14:d1 
	media: autoselect <full-duplex>
	status: inactive
bridge0: flags=8822<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
	options=63<RXCSUM,TXCSUM,TSO4,TSO6>
	ether 62:03:08:1a:fa:00 
	Configuration:
		id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
		maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
		root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
		ipfilter disabled flags 0x2
	member: en1 flags=3<LEARNING,DISCOVER>
	        ifmaxaddr 0 port 5 priority 0 path cost 0
	member: en2 flags=3<LEARNING,DISCOVER>
	        ifmaxaddr 0 port 6 priority 0 path cost 0
	media: <unknown type>
	status: inactive
p2p0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 2304
	ether 02:03:08:a1:e0:f4 
	media: autoselect
	status: inactive
awdl0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1452
	ether 06:56:3d:f6:60:08 
	nd6 options=1<PERFORMNUD>
	media: autoselect
	status: inactive
ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
	inet 10.218.98.228 --> 10.64.64.64 netmask 0xff000000 
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380
	inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb 
	inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64 
	nd6 options=1<PERFORMNUD>


> On Feb 26, 2015, at 10:48 PM, Stephan Ewen <sewen@apache.org> wrote:
> 
> Hi Dulaj!
> 
> Thanks for helping to debug.
> 
> My guess is that you are seeing now the same thing between JobManager and
> TaskManager as you saw before between JobManager and JobClient. I have a
> patch pending that should help the issue (see
> https://issues.apache.org/jira/browse/FLINK-1608), let's see if that solves
> it.
> 
> What seems not right is that the JobManager initially accepted the
> TaskManager and later the communication. Can you paste the TaskManager log
> as well?
> 
> Also: There must be something fairly unique about your network
> configuration, as it works on all other setups that we use (locally, cloud,
> test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
> chance?
> 
> Greetings,
> Stephan
> 
> 
> 
> On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga <vidura.me@icloud.com>
> wrote:
> 
>> Hi,
>>        It’s great to help out. :)
>> 
>>        Setting 127.0.0.1 instead of “localhost” in
>> jobmanager.rpc.address, helped to build the connection to the jobmanager.
>> Apparently localhost resolving is different in webclient and the
>> jobmanager. I think it’s good to set "jobmanager.rpc.address: 127.0.0.1" in
>> future builds.
>>        But then I get this error when I tried to run examples. I don’t
>> know if I should move this issue to another thread. If so please tell me.
>> 
>> bin/flink run
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>> $FLINK_DIRECTORY/count
>> 
>> 
>> 20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
>>       - Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> 02/26/2015 20:46:23     Job execution switched to status RUNNING.
>> 02/26/2015 20:46:23     CHAIN DataSource (at
>> getTextDataSet(WordCount.java:141)
>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1)
>> switched to SCHEDULED
>> 02/26/2015 20:46:23     CHAIN DataSource (at
>> getTextDataSet(WordCount.java:141)
>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1)
>> switched to DEPLOYING
>> 02/26/2015 20:48:03     CHAIN DataSource (at
>> getTextDataSet(WordCount.java:141)
>> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
>> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1)
>> switched to FAILED
>> akka.pattern.AskTimeoutException: Ask timed out on
>> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
>>        at
>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>>        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>>        at
>> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>>        at
>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>>        at
>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>>        at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>>        at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>>        at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>>        at java.lang.Thread.run(Thread.java:745)
>> 
>> 02/26/2015 20:48:03     Job execution switched to status FAILING.
>> 02/26/2015 20:48:03     Reduce (SUM(1), at main(WordCount.java:72)(1/1)
>> switched to CANCELED
>> 02/26/2015 20:48:03     DataSink(CsvOutputFormat (path:
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
>> delimiter:  ))(1/1) switched to CANCELED
>> 02/26/2015 20:48:03     Job execution switched to status FAILED.
>> org.apache.flink.client.program.ProgramInvocationException: The program
>> execution failed.
>>        at org.apache.flink.client.program.Client.run(Client.java:344)
>>        at org.apache.flink.client.program.Client.run(Client.java:306)
>>        at org.apache.flink.client.program.Client.run(Client.java:300)
>>        at
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>        at
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:483)
>>        at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>        at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>        at org.apache.flink.client.program.Client.run(Client.java:250)
>>        at
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>        at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>> execution failed.
>>        at
>> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284)
>>        at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>        at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>        at
>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>        at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>>        at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>>        at
>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>        at
>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>>        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>        at
>> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:88)
>>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>        at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>        at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>        at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>        at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
>> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
>>        at
>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>>        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>>        at
>> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>>        at
>> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>>        at
>> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>>        at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>>        at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>>        at
>> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>>        at java.lang.Thread.run(Thread.java:745)
>> 
>> The exception above occurred while trying to run your command.
>> 
>> 
>>> On Feb 26, 2015, at 12:46 AM, Stephan Ewen <sewen@apache.org> wrote:
>>> 
>>> Addition: To check whether a port is reachable, I think the easiest thing
>>> is to try and connect with a telnet client and see if the connection is
>>> refused.
>>> 
>>> On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <sewen@apache.org> wrote:
>>> 
>>>> Okay, the problem seems to be that even though both the client and the
>>>> jobmanager use "localhost" as the host name, they resolve this to
>> different
>>>> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
>>>> 
>>>> Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
>>>> apparently.
>>>> 
>>>> Can you help us debug this by checking the following:
>>>> 
>>>> - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
>>>> that solves it?
>>>> - Can you try and set "jobmanager.rpc.address" to the other address
>> (10.216.177.146
>>>> or so) and see if that solves it?
>>>> - Can you do "start-cluster.sh", rather than "start-local.sh" and see
>>>> whether the webfrontend displays that the TaskManager connects?
>>>> - As a hard core test: Can you bring up the jobmanager, check where it
>>>> connects (10.216.192.98:6123 or so) and see whether the port is
>> reachable?
>>>> 
>>>> We have recently updated how the Akka URLs are build, to work around a
>>>> limitation in Akka. Seems that did not yet fully solve the issue.
>>>> 
>>>> Thanks for helping us debug this, it is not the easiest immigration
>>>> experience, but the outcome is probably extremely valuable for the
>> project
>>>> :-)
>>>> 
>>>> Greetings,
>>>> Stephan
>>>> 
>>>> 
>>>> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <vidura.me@icloud.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> Sorry for the delay to reply on this issue.
>>>>> the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
>>>>> This can’t be an issue because the job manager web interface works
fine
>>>>> which also runs on localhost
>>>>> 
>>>>> bin/flink run <jar> doesn’t seem to work either. Let me send
you my
>>>>> command and the result in terminal.
>>>>> 
>>>>> bin/flink run
>>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>>>>> 
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>>>>> $FLINK_DIRECTORY/count
>>>>> 
>>>>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>      - Unable to load native-hadoop library for your platform... using
>>>>> builtin-java classes where applicable
>>>>> org.apache.flink.client.program.ProgramInvocationException: Could not
>>>>> build up connection to JobManager.
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:327)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:306)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:300)
>>>>>       at
>>>>> 
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>>>>       at
>>>>> 
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>       at
>>>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>       at
>>>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
>>>>>       at
>>>>> 
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>>>>       at
>>>>> 
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:250)
>>>>>       at
>>>>> 
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>>>>       at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>>>>       at
>>>>> 
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>>>>       at
>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>>>>> Caused by: java.io.IOException: JobManager at akka.tcp://
>>>>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make
>>>>> sure that the JobManager is running and its port is reachable.
>>>>>       at
>>>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
>>>>>       at
>>>>> 
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>>>>       at
>>>>> 
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>>>>       at
>>>>> 
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>>>>       at
>>>>> 
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:322)
>>>>>       ... 15 more
>>>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out
>> after
>>>>> [10000 milliseconds]
>>>>>       at
>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>>>       at
>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>>>       at
>>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>>>       at
>>>>> 
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>>>       at scala.concurrent.Await$.result(package.scala:107)
>>>>>       at
>>>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
>>>>>       ... 20 more
>>>>> 
>>>>> The exception above occurred while trying to run your command.
>>>>> 
>>>>> 
>>>>>> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <sewen@apache.org>
wrote:
>>>>>> 
>>>>>> BTW: Does still work if you enter "localhost" for
>>>>> "jobmanager.rpc.address"
>>>>>> in your flink-conf.yaml ?
>>>>>> 
>>>>>> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <sewen@apache.org>
>> wrote:
>>>>>> 
>>>>>>> Hi!
>>>>>>> 
>>>>>>> I think that this is a problem in the current master (probably
in
>> there
>>>>>>> since a few days ago). I am fixing it...
>>>>>>> 
>>>>>>> Thanks for reporting it!
>>>>>>> 
>>>>>>> Stephan
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <sewen@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Dulaj!
>>>>>>>> 
>>>>>>>> The log suggests that the JobManager binds itself to the
IP
>>>>>>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>>>>>>>> 
>>>>>>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>>>>>>>> 
>>>>>>>> Let me verify whether this is a quirk of your particular
setup, or a
>>>>> bug
>>>>>>>> recently introduces in the 0.9-SNAPSHOT.
>>>>>>>> 
>>>>>>>> Does the command line work for you? ("bin/flink run <jar>")
>>>>>>>> 
>>>>>>>> taskmanager.numberOfTaskSlots: -1  is also okay, this will
mean that
>>>>> the
>>>>>>>> default of '1' is used.
>>>>>>>> 
>>>>>>>> Greetings,
>>>>>>>> Stephan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
>>>>> vidura.me@icloud.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>>>>>>>>> 
>>>>>>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <rmetzger@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> I could not find the logfiles attached to your mails.
I think the
>>>>>>>>>> mailinglists are not accepting attachments.
>>>>>>>>>> Can you put the logs on gist.github.com?
>>>>>>>>>> 
>>>>>>>>>> The configuration values are documented here:
>>>>>>>>>> http://flink.apache.org/docs/0.8/config.html
>>>>>>>>>> For the webclient's port its called webclient.port
>>>>>>>>>> 
>>>>>>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga
<
>>>>> vidura.me@icloud.com
>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I tried to kill the job manager manually in the
terminal and
>> start
>>>>> it
>>>>>>>>>>> again but no luck. Also could you tell me if
it’s possible to
>>>>> change
>>>>>>>>>>> webclient’s port (8080) ?
>>>>>>>>>>> 
>>>>>>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen
<sewen@apache.org>
>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hey Dulaj!
>>>>>>>>>>>> 
>>>>>>>>>>>> As a contributor, I would go against the
latest version, which
>> is
>>>>>>>>>>>> 0.9-SNAPSHOT.
>>>>>>>>>>>> 
>>>>>>>>>>>> It may be in your case that the JobManager
actor is down, but
>> the
>>>>>>>>> process
>>>>>>>>>>>> still lingers. (BTW: I have a patch pending
that makes sure the
>>>>>>>>> process
>>>>>>>>>>>> disappears when the actor via down).
>>>>>>>>>>>> 
>>>>>>>>>>>> Could you have a look at the log
>>>>>>>>> "flink-<user>-jobmanager-<host>-.log"
>>>>>>>>>>> and
>>>>>>>>>>>> see if there are any errors logged?
>>>>>>>>>>>> 
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga"
<
>>>>> vidura.me@icloud.com
>>>>>>>>>> :
>>>>>>>>>>>> 
>>>>>>>>>>>>> The JobManager seems to run fine. I don't
know. When I tried to
>>>>> run
>>>>>>>>>>>>> start-local.sh again, It shows the PID
of the running
>> JobManager
>>>>> and
>>>>>>>>>>> also
>>>>>>>>>>>>> :8081 runs fine. I want to contribute
to the project and I
>> could
>>>>>>>>> get a
>>>>>>>>>>>>> little boost if I could see the capabilities
of FLINK. :)
>>>>>>>>>>>>> Will it be OK to use 0.8.1 as a developer?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan
Ewen <sewen@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Dulaj,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That error message indicates that the
JobManager is not
>> running.
>>>>>>>>> Are you
>>>>>>>>>>>>> sure that the JobManager runs properly?
Anything in the
>>>>> JobManager
>>>>>>>>> logs?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> BTW: The 0.9 branch is under heavy development
/ changes. That
>> is
>>>>>>>>> why it
>>>>>>>>>>>>> may behave a bit different on different
days right now. I would
>>>>>>>>>>> recommend
>>>>>>>>>>>>> to use the 0.8.1 release for a stable
experience.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert
Metzger <
>>>>>>>>> rmetzger@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you for the quick reply.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The log you've send is from the webclient.
Can you also send
>> the
>>>>>>>>> log of
>>>>>>>>>>> the
>>>>>>>>>>>>> 
>>>>>>>>>>>>> JobManager?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj
Viduranga <
>>>>>>>>> vidura.me@icloud.com>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes. It seams it is not a problem
with the arguments. I tried
>>>>> two
>>>>>>>>> days
>>>>>>>>>>>>> 
>>>>>>>>>>>>> but
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> different error occurs. It seams
the web client can’t connect
>> to
>>>>>>>>> the
>>>>>>>>>>> job
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> manager although it is running
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Right now, I can’t even get the
webclient to run.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ./bin/start-webclient.sh
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> executes fine but I cannot connect
to localhost:8080 (even
>> with
>>>>>>>>> telnet
>>>>>>>>>>> or
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> curl)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here is the log for jobManager
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:22:31,933 INFO
>> org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Setting up web frontend server,
using web-root directory
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 'jar:
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>>>>>>>>>>>> '.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:22:31,934 INFO
>> org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Web frontend server will store
temporary files in
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T',
uploaded
>>>>> jobs
>>>>>>>>> in
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> plan-json-dumps in
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:22:31,934 INFO
>> org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Web-frontend will submit jobs to
nephele job-manager on
>>>>>>>>>>>>> 
>>>>>>>>>>>>> localhost,
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> port 6123.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Slf4jLogger started
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:22:32,625 INFO Remoting
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Starting remoting
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:22:32,838 INFO Remoting
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Remoting started; listening on
addresses :[akka.tcp://
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:23:48,119 WARN Remoting
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Tried to associate with unreachable
remote address
>>>>> [akka.tcp://
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> flink@10.218.98.169:6123]. Address
is now gated for 5000 ms,
>>>>> all
>>>>>>>>>>>>> 
>>>>>>>>>>>>> messages
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to this address will be delivered
to dead letters. Reason:
>>>>>>>>> Operation
>>>>>>>>>>>>> 
>>>>>>>>>>>>> timed
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> out: /10.218.98.169:6123
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Unexpected exception: Could not
find job manager at
>> specified
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> java.lang.RuntimeException: Could
not find job manager at
>>>>> specified
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> at
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> at
>> org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Feb 23, 2015, at 11:46 PM,
Robert Metzger <
>>>>> rmetzger@apache.org
>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> you said in the other email thread
that the error only occurs
>>>>> for
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Wordcount, not for Kmeans.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Can you copy me the commands
for both examples?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I can not really believe that
there is a difference between
>> the
>>>>>>>>> two
>>>>>>>>>>>>> 
>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Can you also send us the contents
of the jobmanager log file?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Robert
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 6:04
PM, Dulaj Viduranga <
>>>>>>>>>>> vidura.me@icloud.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I’m getting "Could not
build up connection to JobManager.”
>>>>> When i
>>>>>>>>>>>>> 
>>>>>>>>>>>>> tried
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> to
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> run the wordCount example.
Can anyone help?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Dulaj
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message