flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Could not build up connection to JobManager
Date Fri, 27 Feb 2015 10:42:06 GMT
It depends on how you started Flink. If you started a local cluster, then
the TaskManager log is contained in the JobManager log we just don't see
the respective log output in the snippet you posted. If you started a
TaskManager independently, either by taskmanager.sh or by start-cluster.sh,
then a file with the name format flink-<user>-taskmanager-<hostname>.log
should be created in flink/log/. If the Flink directory is not shared  by
your cluster nodes, then you have to look on the machine on which you
started the TaskManager.

But since the JobManager binds to 127.0.0.1 I guess that you started a
local cluster. Try whether you find some logging statements from the
logger org.apache.flink.runtime.taskmanager.TaskManager in your log. Maybe
you can upload the corresponding log file to [1] and post a link here.

Greets,

Till

[1] https://gist.github.com/

On Thu, Feb 26, 2015 at 6:45 PM, Dulaj Viduranga <vidura.me@icloud.com>
wrote:

> Hi,
>         Can you tell me where I can find TaskManager logs. I can’t find
> them in logs folder? I don’t suppose I should run taskmanager.sh as well.
> Right?
>         I’m using a OS X Yosemite. I’ll send you my ifconfig.
>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
>         options=3<RXCSUM,TXCSUM>
>         inet6 ::1 prefixlen 128
>         inet 127.0.0.1 netmask 0xff000000
>         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>         nd6 options=1<PERFORMNUD>
> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
> stf0: flags=0<> mtu 1280
> en0: flags=8823<UP,BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
>         ether 60:03:08:a1:e0:f4
>         nd6 options=1<PERFORMNUD>
>         media: autoselect (<unknown type>)
>         status: inactive
> en1: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
> 1500
>         options=60<TSO4,TSO6>
>         ether 72:00:02:32:14:d0
>         media: autoselect <full-duplex>
>         status: inactive
> en2: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu
> 1500
>         options=60<TSO4,TSO6>
>         ether 72:00:02:32:14:d1
>         media: autoselect <full-duplex>
>         status: inactive
> bridge0: flags=8822<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500
>         options=63<RXCSUM,TXCSUM,TSO4,TSO6>
>         ether 62:03:08:1a:fa:00
>         Configuration:
>                 id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
>                 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
>                 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
>                 ipfilter disabled flags 0x2
>         member: en1 flags=3<LEARNING,DISCOVER>
>                 ifmaxaddr 0 port 5 priority 0 path cost 0
>         member: en2 flags=3<LEARNING,DISCOVER>
>                 ifmaxaddr 0 port 6 priority 0 path cost 0
>         media: <unknown type>
>         status: inactive
> p2p0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 2304
>         ether 02:03:08:a1:e0:f4
>         media: autoselect
>         status: inactive
> awdl0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1452
>         ether 06:56:3d:f6:60:08
>         nd6 options=1<PERFORMNUD>
>         media: autoselect
>         status: inactive
> ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
>         inet 10.218.98.228 --> 10.64.64.64 netmask 0xff000000
> utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1380
>         inet6 fe80::b0d4:d4be:7e62:e730%utun0 prefixlen 64 scopeid 0xb
>         inet6 fdd0:b291:7da7:9153:b0d4:d4be:7e62:e730 prefixlen 64
>         nd6 options=1<PERFORMNUD>
>
>
> > On Feb 26, 2015, at 10:48 PM, Stephan Ewen <sewen@apache.org> wrote:
> >
> > Hi Dulaj!
> >
> > Thanks for helping to debug.
> >
> > My guess is that you are seeing now the same thing between JobManager and
> > TaskManager as you saw before between JobManager and JobClient. I have a
> > patch pending that should help the issue (see
> > https://issues.apache.org/jira/browse/FLINK-1608), let's see if that
> solves
> > it.
> >
> > What seems not right is that the JobManager initially accepted the
> > TaskManager and later the communication. Can you paste the TaskManager
> log
> > as well?
> >
> > Also: There must be something fairly unique about your network
> > configuration, as it works on all other setups that we use (locally,
> cloud,
> > test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
> > chance?
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga <vidura.me@icloud.com>
> > wrote:
> >
> >> Hi,
> >>        It’s great to help out. :)
> >>
> >>        Setting 127.0.0.1 instead of “localhost” in
> >> jobmanager.rpc.address, helped to build the connection to the
> jobmanager.
> >> Apparently localhost resolving is different in webclient and the
> >> jobmanager. I think it’s good to set "jobmanager.rpc.address:
> 127.0.0.1" in
> >> future builds.
> >>        But then I get this error when I tried to run examples. I don’t
> >> know if I should move this issue to another thread. If so please tell
> me.
> >>
> >> bin/flink run
> >>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
> >>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
> >> $FLINK_DIRECTORY/count
> >>
> >>
> >> 20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>       - Unable to load native-hadoop library for your platform... using
> >> builtin-java classes where applicable
> >> 02/26/2015 20:46:23     Job execution switched to status RUNNING.
> >> 02/26/2015 20:46:23     CHAIN DataSource (at
> >> getTextDataSet(WordCount.java:141)
> >> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
> >> main(WordCount.java:69)) -> Combine(SUM(1), at
> main(WordCount.java:72)(1/1)
> >> switched to SCHEDULED
> >> 02/26/2015 20:46:23     CHAIN DataSource (at
> >> getTextDataSet(WordCount.java:141)
> >> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
> >> main(WordCount.java:69)) -> Combine(SUM(1), at
> main(WordCount.java:72)(1/1)
> >> switched to DEPLOYING
> >> 02/26/2015 20:48:03     CHAIN DataSource (at
> >> getTextDataSet(WordCount.java:141)
> >> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
> >> main(WordCount.java:69)) -> Combine(SUM(1), at
> main(WordCount.java:72)(1/1)
> >> switched to FAILED
> >> akka.pattern.AskTimeoutException: Ask timed out on
> >> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
> >>        at
> >>
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
> >>        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
> >>        at
> >>
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
> >>        at
> >>
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
> >>        at
> >>
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
> >>        at
> >>
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
> >>        at
> >>
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
> >>        at
> >> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
> >>        at java.lang.Thread.run(Thread.java:745)
> >>
> >> 02/26/2015 20:48:03     Job execution switched to status FAILING.
> >> 02/26/2015 20:48:03     Reduce (SUM(1), at main(WordCount.java:72)(1/1)
> >> switched to CANCELED
> >> 02/26/2015 20:48:03     DataSink(CsvOutputFormat (path:
> >>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
> >> delimiter:  ))(1/1) switched to CANCELED
> >> 02/26/2015 20:48:03     Job execution switched to status FAILED.
> >> org.apache.flink.client.program.ProgramInvocationException: The program
> >> execution failed.
> >>        at org.apache.flink.client.program.Client.run(Client.java:344)
> >>        at org.apache.flink.client.program.Client.run(Client.java:306)
> >>        at org.apache.flink.client.program.Client.run(Client.java:300)
> >>        at
> >>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> >>        at
> >>
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>        at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>        at java.lang.reflect.Method.invoke(Method.java:483)
> >>        at
> >>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> >>        at
> >>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> >>        at org.apache.flink.client.program.Client.run(Client.java:250)
> >>        at
> >> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> >>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> >>        at
> >>
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> >>        at
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> >> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
> >> execution failed.
> >>        at
> >>
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284)
> >>        at
> >>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>        at
> >>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>        at
> >>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>        at
> >>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
> >>        at
> >>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
> >>        at
> >> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>        at
> >>
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
> >>        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>        at
> >>
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:88)
> >>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> >>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>        at
> >> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>        at
> >>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>        at
> >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>        at
> >>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> >> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
> >>        at
> >>
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
> >>        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
> >>        at
> >>
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
> >>        at
> >>
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
> >>        at
> >>
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
> >>        at
> >>
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
> >>        at
> >>
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
> >>        at
> >> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
> >>        at java.lang.Thread.run(Thread.java:745)
> >>
> >> The exception above occurred while trying to run your command.
> >>
> >>
> >>> On Feb 26, 2015, at 12:46 AM, Stephan Ewen <sewen@apache.org> wrote:
> >>>
> >>> Addition: To check whether a port is reachable, I think the easiest
> thing
> >>> is to try and connect with a telnet client and see if the connection is
> >>> refused.
> >>>
> >>> On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <sewen@apache.org>
> wrote:
> >>>
> >>>> Okay, the problem seems to be that even though both the client and the
> >>>> jobmanager use "localhost" as the host name, they resolve this to
> >> different
> >>>> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
> >>>>
> >>>> Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
> >>>> apparently.
> >>>>
> >>>> Can you help us debug this by checking the following:
> >>>>
> >>>> - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see
if
> >>>> that solves it?
> >>>> - Can you try and set "jobmanager.rpc.address" to the other address
> >> (10.216.177.146
> >>>> or so) and see if that solves it?
> >>>> - Can you do "start-cluster.sh", rather than "start-local.sh" and see
> >>>> whether the webfrontend displays that the TaskManager connects?
> >>>> - As a hard core test: Can you bring up the jobmanager, check where
it
> >>>> connects (10.216.192.98:6123 or so) and see whether the port is
> >> reachable?
> >>>>
> >>>> We have recently updated how the Akka URLs are build, to work around
a
> >>>> limitation in Akka. Seems that did not yet fully solve the issue.
> >>>>
> >>>> Thanks for helping us debug this, it is not the easiest immigration
> >>>> experience, but the outcome is probably extremely valuable for the
> >> project
> >>>> :-)
> >>>>
> >>>> Greetings,
> >>>> Stephan
> >>>>
> >>>>
> >>>> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>> Sorry for the delay to reply on this issue.
> >>>>> the jobmanager.rpc.address is set to “localhost” already in
> conf.yaml.
> >>>>> This can’t be an issue because the job manager web interface works
> fine
> >>>>> which also runs on localhost
> >>>>>
> >>>>> bin/flink run <jar> doesn’t seem to work either. Let me
send you my
> >>>>> command and the result in terminal.
> >>>>>
> >>>>> bin/flink run
> >>>>>
> >>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
> >>>>>
> >>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
> >>>>> $FLINK_DIRECTORY/count
> >>>>>
> >>>>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>>>>      - Unable to load native-hadoop library for your platform...
> using
> >>>>> builtin-java classes where applicable
> >>>>> org.apache.flink.client.program.ProgramInvocationException: Could
not
> >>>>> build up connection to JobManager.
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:327)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:306)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:300)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> >>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>       at
> >>>>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>>>       at
> >>>>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:250)
> >>>>>       at
> >>>>>
> >> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> >>>>>       at
> org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> >>>>>       at
> >> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> >>>>> Caused by: java.io.IOException: JobManager at akka.tcp://
> >>>>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please
make
> >>>>> sure that the JobManager is running and its port is reachable.
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:322)
> >>>>>       ... 15 more
> >>>>> Caused by: java.util.concurrent.TimeoutException: Futures timed
out
> >> after
> >>>>> [10000 milliseconds]
> >>>>>       at
> >>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> >>>>>       at
> >>>>>
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> >>>>>       at
> >>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> >>>>>       at
> >>>>>
> >>
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> >>>>>       at scala.concurrent.Await$.result(package.scala:107)
> >>>>>       at
> >>>>>
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
> >>>>>       ... 20 more
> >>>>>
> >>>>> The exception above occurred while trying to run your command.
> >>>>>
> >>>>>
> >>>>>> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <sewen@apache.org>
wrote:
> >>>>>>
> >>>>>> BTW: Does still work if you enter "localhost" for
> >>>>> "jobmanager.rpc.address"
> >>>>>> in your flink-conf.yaml ?
> >>>>>>
> >>>>>> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <sewen@apache.org>
> >> wrote:
> >>>>>>
> >>>>>>> Hi!
> >>>>>>>
> >>>>>>> I think that this is a problem in the current master (probably
in
> >> there
> >>>>>>> since a few days ago). I am fixing it...
> >>>>>>>
> >>>>>>> Thanks for reporting it!
> >>>>>>>
> >>>>>>> Stephan
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <sewen@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Dulaj!
> >>>>>>>>
> >>>>>>>> The log suggests that the JobManager binds itself to
the IP
> >>>>>>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
> >>>>>>>>
> >>>>>>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
> >>>>>>>>
> >>>>>>>> Let me verify whether this is a quirk of your particular
setup,
> or a
> >>>>> bug
> >>>>>>>> recently introduces in the 0.9-SNAPSHOT.
> >>>>>>>>
> >>>>>>>> Does the command line work for you? ("bin/flink run
<jar>")
> >>>>>>>>
> >>>>>>>> taskmanager.numberOfTaskSlots: -1  is also okay, this
will mean
> that
> >>>>> the
> >>>>>>>> default of '1' is used.
> >>>>>>>>
> >>>>>>>> Greetings,
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
> >>>>> vidura.me@icloud.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Is taskmanager.numberOfTaskSlots: -1 normal?
> >>>>>>>>>
> >>>>>>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger
<
> rmetzger@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> I could not find the logfiles attached to your
mails. I think
> the
> >>>>>>>>>> mailinglists are not accepting attachments.
> >>>>>>>>>> Can you put the logs on gist.github.com?
> >>>>>>>>>>
> >>>>>>>>>> The configuration values are documented here:
> >>>>>>>>>> http://flink.apache.org/docs/0.8/config.html
> >>>>>>>>>> For the webclient's port its called webclient.port
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga
<
> >>>>> vidura.me@icloud.com
> >>>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I tried to kill the job manager manually
in the terminal and
> >> start
> >>>>> it
> >>>>>>>>>>> again but no luck. Also could you tell me
if it’s possible to
> >>>>> change
> >>>>>>>>>>> webclient’s port (8080) ?
> >>>>>>>>>>>
> >>>>>>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan
Ewen <sewen@apache.org>
> >>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hey Dulaj!
> >>>>>>>>>>>>
> >>>>>>>>>>>> As a contributor, I would go against
the latest version, which
> >> is
> >>>>>>>>>>>> 0.9-SNAPSHOT.
> >>>>>>>>>>>>
> >>>>>>>>>>>> It may be in your case that the JobManager
actor is down, but
> >> the
> >>>>>>>>> process
> >>>>>>>>>>>> still lingers. (BTW: I have a patch
pending that makes sure
> the
> >>>>>>>>> process
> >>>>>>>>>>>> disappears when the actor via down).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Could you have a look at the log
> >>>>>>>>> "flink-<user>-jobmanager-<host>-.log"
> >>>>>>>>>>> and
> >>>>>>>>>>>> see if there are any errors logged?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Greetings,
> >>>>>>>>>>>> Stephan
> >>>>>>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga"
<
> >>>>> vidura.me@icloud.com
> >>>>>>>>>> :
> >>>>>>>>>>>>
> >>>>>>>>>>>>> The JobManager seems to run fine.
I don't know. When I tried
> to
> >>>>> run
> >>>>>>>>>>>>> start-local.sh again, It shows the
PID of the running
> >> JobManager
> >>>>> and
> >>>>>>>>>>> also
> >>>>>>>>>>>>> :8081 runs fine. I want to contribute
to the project and I
> >> could
> >>>>>>>>> get a
> >>>>>>>>>>>>> little boost if I could see the
capabilities of FLINK. :)
> >>>>>>>>>>>>> Will it be OK to use 0.8.1 as a
developer?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan
Ewen <sewen@apache.org
> >
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Dulaj,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> That error message indicates that
the JobManager is not
> >> running.
> >>>>>>>>> Are you
> >>>>>>>>>>>>> sure that the JobManager runs properly?
Anything in the
> >>>>> JobManager
> >>>>>>>>> logs?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> BTW: The 0.9 branch is under heavy
development / changes.
> That
> >> is
> >>>>>>>>> why it
> >>>>>>>>>>>>> may behave a bit different on different
days right now. I
> would
> >>>>>>>>>>> recommend
> >>>>>>>>>>>>> to use the 0.8.1 release for a stable
experience.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Greetings,
> >>>>>>>>>>>>> Stephan
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM,
Robert Metzger <
> >>>>>>>>> rmetzger@apache.org>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you for the quick reply.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The log you've send is from the
webclient. Can you also send
> >> the
> >>>>>>>>> log of
> >>>>>>>>>>> the
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> JobManager?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM,
Dulaj Viduranga <
> >>>>>>>>> vidura.me@icloud.com>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes. It seams it is not a problem
with the arguments. I
> tried
> >>>>> two
> >>>>>>>>> days
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> different error occurs. It seams
the web client can’t
> connect
> >> to
> >>>>>>>>> the
> >>>>>>>>>>> job
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> manager although it is running
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Right now, I can’t even get
the webclient to run.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ./bin/start-webclient.sh
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> executes fine but I cannot connect
to localhost:8080 (even
> >> with
> >>>>>>>>> telnet
> >>>>>>>>>>> or
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> curl)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here is the log for jobManager
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:22:31,933 INFO
> >> org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Setting up web frontend server,
using web-root directory
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 'jar:
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>
> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
> >>>>>>>>>>>>> '.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:22:31,934 INFO
> >> org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Web frontend server will store
temporary files in
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T',
uploaded
> >>>>> jobs
> >>>>>>>>> in
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> plan-json-dumps in
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:22:31,934 INFO
> >> org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Web-frontend will submit jobs
to nephele job-manager on
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> localhost,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> port 6123.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Slf4jLogger started
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:22:32,625 INFO Remoting
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Starting remoting
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:22:32,838 INFO Remoting
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Remoting started; listening
on addresses :[akka.tcp://
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:23:48,119 WARN Remoting
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Tried to associate with unreachable
remote address
> >>>>> [akka.tcp://
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> flink@10.218.98.169:6123]. Address
is now gated for 5000
> ms,
> >>>>> all
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> messages
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> to this address will be delivered
to dead letters. Reason:
> >>>>>>>>> Operation
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> timed
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> out: /10.218.98.169:6123
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Unexpected exception: Could
not find job manager at
> >> specified
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager
> '>tcp://
> >>>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> java.lang.RuntimeException:
Could not find job manager at
> >>>>> specified
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager
> '>tcp://
> >>>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> at
> >> org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Feb 23, 2015, at 11:46
PM, Robert Metzger <
> >>>>> rmetzger@apache.org
> >>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> you said in the other email
thread that the error only
> occurs
> >>>>> for
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Wordcount, not for Kmeans.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Can you copy me the commands
for both examples?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I can not really believe
that there is a difference between
> >> the
> >>>>>>>>> two
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> jobs.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Can you also send us the
contents of the jobmanager log
> file?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Robert
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Feb 23, 2015 at
6:04 PM, Dulaj Viduranga <
> >>>>>>>>>>> vidura.me@icloud.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I’m getting "Could
not build up connection to JobManager.”
> >>>>> When i
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> tried
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> run the wordCount example.
Can anyone help?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Dulaj
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message