flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Task managers cant start on YARN cluster
Date Mon, 14 Nov 2016 10:14:13 GMT
Ah, sorry. I thought it was something related to Flink. ;)

On 14 November 2016 at 10:59:44, Gyula Fóra (gyula.fora@gmail.com) wrote:
> What I mean is the logs coming from org.apache.hadoop.ipc.Client if you
> look at my original email (at JM logs)
>  
> Gyula
>  
> Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 10:52):
>  
> > What was the log message shown on DEBUG level?
> >
> > Maybe it makes sense to promote it to INFO. ;)
> >
> > I guess there is no easy way to verify the version, right Max or Robert?
> >
> > On 14 November 2016 at 10:45:52, Gyula Fóra (gyula.fora@gmail.com) wrote:
> > > Hi,
> > >
> > > The main problem was that whatever was going wrong was not apparent in
> > the
> > > Flink Application master runner but it was only shown in the YarnClient
> > > debug log.
> > >
> > > If you run with the default INFO log level all you see that the Yarn
> > client
> > > is trying to fail over again and again as if something was wrong with the
> > > resource manager. Setting it to debug actually shows the error.
> > >
> > > Also it would be great if there was a way to verify YARN versions and
> > > incompatibility, not sure if this is possible easily.
> > >
> > > Gyula
> > >
> > > Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 9:42):
> > >
> > > > Good to know that you solved this. :) Do you think there is something
> > we
> > > > can do to help users noticing this situation faster?
> > > >
> > > > – Ufuk
> > > >
> > > > On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.fora@gmail.com)
> > wrote:
> > > > > Hi,
> > > > >
> > > > > What happened is that I compiled Flink with the wrong hadoop
> > version...
> > > > >
> > > > > Sorry :)
> > > > > Gyula
> > > > >
> > > > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo,
> > > > > 13:11):
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am running into some strange issues on yarn with Flink 1.1.3
&
> > 4. For
> > > > > > some reason I started getting this error (see under text.)
> > > > > > The job manager starts and the application is in Accepted state
but
> > > > cannot
> > > > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030
> > seems
> > > > > > strange)
> > > > > >
> > > > > > I didn't change anything on the yarn cluster and this seemed
to
> > work
> > > > > > previously (but I just cant get it to work now). The yarn-site.xml
> > > > contains
> > > > > > the proper rm addresses.
> > > > > >
> > > > > > Anybody has any ideas where to go from here?
> > > > > >
> > > > > > Cheers,
> > > > > > Gyula
> > > > > >
> > > > > > JM log:
> > > > > >
> > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client -
The
> > ping
> > > > interval
> > > > > is 60000 ms.
> > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client -
> > > > Connecting to /0.0.0.0:8030
> > > > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client -
> > closing
> > > > ipc connection
> > > > > to 0.0.0.0/0.0.0.0:8030: Connection refused
> > > > > >
> > > > > > java.net.ConnectException: Call From
> > > > splat24.sto.midasplayer.com/172.25.86.166
> > > > > to 0.0.0.0:8030 failed on connection exception:
> > > > java.net.ConnectException: Connection
> > > > > refused; For more details see:
> > > > http://wiki.apache.org/hadoop/ConnectionRefused
> > > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > > Method)
> > > > > > at
> > > >
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 
> > > > > > at
> > > >
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 
> > > > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> > > > > > at org.apache.hadoop.net
> > .NetUtils.wrapWithMessage(NetUtils.java:783)
> > > > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> > > > > > at
> > > >
> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 
> > > > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> > > > > > at
> > > >
> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
 
> > > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > > > at
> > > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  
> > > > > > at
> > > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
> > > > > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > > > > at
> > > >
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 
> > > > > > at
> > > >
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 
> > > > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> > > > > > at
> > > >
> > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
 
> > > > > > at
> > > >
> > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
 
> > > > > > at
> > > >
> > org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)
 
> > > > > > at
> > > >
> > org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)
 
> > > > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> > > > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> > > > > > at akka.actor.ActorCell.create(ActorCell.scala:580)
> > > > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> > > > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> > > > > >
> > > > > >
> > > > > > Client:
> > > > > >
> > > > > > 2016-11-12 12:31:31,080 INFO
> > > > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > > > - No path for the flink jar passed. Using the location of class
> > > > org.apache.flink.yarn.YarnClusterDescriptor
> > > > > to locate the jar
> > > > > > 2016-11-12 12:31:31,080 INFO
> > > > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > > > - No path for the flink jar passed. Using the location of class
> > > > org.apache.flink.yarn.YarnClusterDescriptor
> > > > > to locate the jar
> > > > > > 2016-11-12 12:31:31,101 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Using values:
> > > > > > 2016-11-12 12:31:31,101 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > TaskManager count = 1
> > > > > > 2016-11-12 12:31:31,101 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > JobManager memory = 1024
> > > > > > 2016-11-12 12:31:31,102 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > TaskManager memory = 11000
> > > > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy
> > -
> > > > Connecting
> > > > > to ResourceManager at /0.0.0.0:8032
> > > > > > 2016-11-12 12:31:31,394 WARN
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > The file system scheme is 'file'. This indicates that the specified
> > > > Hadoop configuration
> > > > > path is wrong and the system is using the default Hadoop
> > configuration
> > > > values.The Flink
> > > > > YARN client needs to store its files in a distributed file system
> > > > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties
 
> > > > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/flink-1.1.3/lib
> > > > > to
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib  
> > > > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar
 
> > > > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar
 
> > > > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying
> > > > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml
 
> > > > > > 2016-11-12 12:33:02,218 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Submitting application master application_1478896050022_0013
> > > > > > 2016-11-12 12:33:02,256 INFO
> > > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
> > > > > - Submitted application application_1478896050022_0013
> > > > > > 2016-11-12 12:33:02,257 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Waiting for the cluster to be allocated
> > > > > > 2016-11-12 12:33:02,259 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Deploying cluster, current state ACCEPTED
> > > > > > 2016-11-12 12:34:02,485 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Deployment took more than 60 seconds. Please check if the requested
> > > > resources are available
> > > > > in the YARN cluster
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>  


Mime
View raw message