flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ani.desh1512" <ani.desh1...@gmail.com>
Subject Netty issues while deploying Flink with Yarn on MapR
Date Thu, 02 Feb 2017 16:39:43 GMT
I am trying to run Flink using Yarn on MapR. My  previous issue
<http://http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-with-Yarn-on-MapR-tt15448.html>
 
got resolved and I have updated the original post accordingly so. 

Accordingly, I modified pom.xml to change the zookeeper version to *mapr
zookeeper jar* version which in my case was: *3.4.5-mapr-1604*
I then built flink (*flink-1.3-SNAPSHOT*) as follows: 

*mvn clean install -DskipTests -Pvendor-repos
-Dhadoop.version=2.7.0-mapr-1607*

The build is successfull. Then I try to run *./bin/yarn-session.sh -n 3* and
get the following error:



/2017-02-02 16:11:10,717 INFO  org.apache.flink.yarn.YarnClusterDescriptor               
  
- Using values:

2017-02-02 16:11:10,718 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
-      TaskManager count = 3
2017-02-02 16:11:10,718 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
-      JobManager memory = 1024
2017-02-02 16:11:10,718 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
-      TaskManager memory = 1024
2017-02-02 16:11:10,928 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Process path: null. Event state: SyncConnected. Event type: None
2017-02-02 16:11:10,928 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Connected to ZK:
ip-10-101-2-111.ec2.internal:5181,ip-10-101-2-112.ec2.internal:5181,ip-10-101-2-113.ec2.internal:5181
2017-02-02 16:11:10,929 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Getting serviceData for master node of resourcemanager
2017-02-02 16:11:10,935 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Process path: null. Event state: SaslAuthenticated. Event type: None
2017-02-02 16:11:10,948 INFO 
org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider  - Updated
RM address to ip-10-101-2-111.ec2.internal/10.101.2.111:8032
2017-02-02 16:11:11,216 WARN  org.apache.flink.yarn.YarnClusterDescriptor                
 
- The configuration directory
('/home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf')
contains both LOG4J and Logback configuration files. Please delete or rename
one of them.
2017-02-02 16:11:11,225 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/log4j.properties
to
maprfs:/user/ubuntu/.flink/application_1485984594262_0007/log4j.properties
2017-02-02 16:11:11,249 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib
to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/lib
2017-02-02 16:11:11,680 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/logback.xml
to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/logback.xml
2017-02-02 16:11:11,685 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib/flink-dist_2.10-1.3-SNAPSHOT.jar
to
maprfs:/user/ubuntu/.flink/application_1485984594262_0007/flink-dist_2.10-1.3-SNAPSHOT.jar
2017-02-02 16:11:12,932 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
/home/ubuntu/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/flink-conf.yaml
to maprfs:/user/ubuntu/.flink/application_1485984594262_0007/flink-conf.yaml
2017-02-02 16:11:12,949 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Submitting application master application_1485984594262_0007
2017-02-02 16:11:12,977 INFO 
org.apache.hadoop.yarn.security.ExternalTokenManagerFactory   - Initialized
external token manager class -
com.mapr.hadoop.yarn.security.MapRTicketManager
2017-02-02 16:11:13,195 INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted
application application_1485984594262_0007
2017-02-02 16:11:13,195 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Waiting for the cluster to be allocated
2017-02-02 16:11:13,196 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Deploying cluster, current state ACCEPTED
Error while deploying YARN cluster: Couldn't deploy Yarn cluster
java.lang.RuntimeException: Couldn't deploy Yarn cluster
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473)
     at
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
     at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473)
Caused by:
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
The YARN application unexpectedly switched to state FAILED during
deployment.
Diagnostics from YARN: Application application_1485984594262_0007 failed 1
times due to AM Container for appattempt_1485984594262_0007_000001 exited
with  exitCode: 255
For more detailed output, check application tracking
page:http://ip-10-101-2-111.ec2.internal:8088/cluster/app/application_1485984594262_0007Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1485984594262_0007_01_000001
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
     at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
     at org.apache.hadoop.util.Shell.run(Shell.java:456)
     at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
     at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
     at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
     at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : user is ubuntu
main : requested yarn user is ubuntu


Container exited with a non-zero exit code 255
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further
investigate the issue:
yarn logs -applicationId application_1485984594262_0007
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:888)
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:557)
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:423)
     ... 9 more
2017-02-02 16:11:18,231 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Cancelling deployment from Deployment Failure Hook
2017-02-02 16:11:18,231 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Killing YARN application
2017-02-02 16:11:18,235 INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Killed
application application_1485984594262_0007
2017-02-02 16:11:18,336 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Deleting files in
maprfs:/user/ubuntu/.flink/application_1485984594262_0007/

So i went ahead and checked the yarn container logs and they have the
following error:


/Uncaught error from thread [flink-akka.remote.default-remote-dispatcher-5]
shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for
ActorSystem[flink]

java.lang.VerifyError: (class:
org/jboss/netty/channel/socket/nio/NioWorkerPool, method: createWorker
signature:
(Ljava/util/concurrent/Executor;)Lorg/jboss/netty/channel/socket/nio/AbstractNioWorker;)
Wrong return type in function
        at
akka.remote.transport.netty.NettyTransport.<init>(NettyTransport.scala:295)
        at
akka.remote.transport.netty.NettyTransport.<init>(NettyTransport.scala:251)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
        at scala.util.Try$.apply(Try.scala:161)
        at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
        at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
        at
akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
        at scala.util.Success.flatMap(Try.scala:200)
        at
akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
        at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:749)
        at akka.remote.EndpointManager$$anonfun$9.apply(Remoting.scala:741)
        at
scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:722)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at
scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:721)
        at
akka.remote.EndpointManager.akka$remote$EndpointManager$$listens(Remoting.scala:741)
        at
akka.remote.EndpointManager$$anonfun$receive$2.applyOrElse(Remoting.scala:500)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
        at akka.remote.EndpointManager.aroundReceive(Remoting.scala:404)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)/



So, I figured this might be because of clashing netty versions between flink
and MapR’s zookeeper jar. 

And indeed, MapR’s zookeeper jar has following version of netty

/<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
<version>3.2.2.Final</version>/

And Flink has following:
/<groupId>io.netty</groupId>                                

<artifactId>netty-all</artifactId>

 <version>4.0.27.Final</version>/



So, I changed Flink’s pom.xml to exclude netty from zookeeper dependency.

/<exclusion>

<groupId>org.jboss.netty</groupId>

<artifactId>netty</artifactId>
 </exclusion>/



Then I again ran ./bin/yarn-session.sh -n 3 and got the following error:



/2017-02-02 15:44:03,540 INFO  org.apache.flink.yarn.YarnClusterDescriptor               
  
- Using values:

2017-02-02 15:44:03,541 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
-      TaskManager count = 3
2017-02-02 15:44:03,541 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
-      JobManager memory = 1024
2017-02-02 15:44:03,541 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
-      TaskManager memory = 1024
2017-02-02 15:44:03,728 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Process path: null. Event state: SyncConnected. Event type: None
2017-02-02 15:44:03,728 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Connected to ZK:
ip-10-101-2-111.ec2.internal:5181,ip-10-101-2-112.ec2.internal:5181,ip-10-101-2-113.ec2.internal:5181
2017-02-02 15:44:03,729 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Getting serviceData for master node of resourcemanager
2017-02-02 15:44:03,733 INFO  com.mapr.util.zookeeper.ZKDataRetrieval                    
 
- Process path: null. Event state: SaslAuthenticated. Event type: None
2017-02-02 15:44:03,745 INFO 
org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider  - Updated
RM address to ip-10-101-2-111.ec2.internal/10.101.2.111:8032
2017-02-02 15:44:04,016 WARN  org.apache.flink.yarn.YarnClusterDescriptor                
 
- The configuration directory
('/home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf')
contains both LOG4J and Logback configuration files. Please delete or rename
one of them.
2017-02-02 15:44:04,025 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib
to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/lib
2017-02-02 15:44:04,446 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/logback.xml
to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/logback.xml
2017-02-02 15:44:04,452 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/log4j.properties
to
maprfs:/user/ubuntu/.flink/application_1485984594262_0006/log4j.properties
2017-02-02 15:44:04,457 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
file:///home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/lib/flink-dist_2.10-1.3-SNAPSHOT.jar
to
maprfs:/user/ubuntu/.flink/application_1485984594262_0006/flink-dist_2.10-1.3-SNAPSHOT.jar
2017-02-02 15:44:05,826 INFO  org.apache.flink.yarn.Utils                                
 
- Copying from
/home/ubuntu/yarn_flink/flink/flink-dist/target/flink-1.3-SNAPSHOT-bin/flink-1.3-SNAPSHOT/conf/flink-conf.yaml
to maprfs:/user/ubuntu/.flink/application_1485984594262_0006/flink-conf.yaml
2017-02-02 15:44:05,842 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Submitting application master application_1485984594262_0006
2017-02-02 15:44:05,870 INFO 
org.apache.hadoop.yarn.security.ExternalTokenManagerFactory   - Initialized
external token manager class -
com.mapr.hadoop.yarn.security.MapRTicketManager
2017-02-02 15:44:06,088 INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted
application application_1485984594262_0006
2017-02-02 15:44:06,089 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Waiting for the cluster to be allocated
2017-02-02 15:44:06,090 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Deploying cluster, current state ACCEPTED
Error while deploying YARN cluster: Couldn't deploy Yarn cluster
java.lang.RuntimeException: Couldn't deploy Yarn cluster
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:428)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473)
     at
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
     at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
     at
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473)
Caused by:
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
The YARN application unexpectedly switched to state FAILED during
deployment.
Diagnostics from YARN: Application application_1485984594262_0006 failed 1
times due to AM Container for appattempt_1485984594262_0006_000001 exited
with  exitCode: 31
For more detailed output, check application tracking
page:http://ip-10-101-2-111.ec2.internal:8088/cluster/app/application_1485984594262_0006Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e05_1485984594262_0006_01_000001
Exit code: 31
Stack trace: ExitCodeException exitCode=31:
     at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
     at org.apache.hadoop.util.Shell.run(Shell.java:456)
     at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
     at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:304)
     at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:354)
     at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87)
     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
     at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : user is ubuntu
main : requested yarn user is ubuntu


Container exited with a non-zero exit code 31
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further
investigate the issue:
yarn logs -applicationId application_1485984594262_0006
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:891)
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:560)
     at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:426)
     ... 9 more
2017-02-02 15:44:12,635 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Cancelling deployment from Deployment Failure Hook
2017-02-02 15:44:12,635 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Killing YARN application
2017-02-02 15:44:12,641 INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Killed
application application_1485984594262_0006
2017-02-02 15:44:12,742 INFO  org.apache.flink.yarn.YarnClusterDescriptor                
 
- Deleting files in
maprfs:/user/ubuntu/.flink/application_1485984594262_0006/



So i went ahead and checked the yarn container logs again and they have the
following error:

/2017-02-02 15:44:11,3521 ERROR JniCommon
fs/client/fileclient/cc/jni_MapRClient.cc:580 Thread: 19306 Client
initialization failed due to mismatch of libraries. Please make sure that
the java library version matches the native build version 5.2.0.39122.GA and
native patch version $Id: mapr-version: 5.2.0.39122.GA 40967:64c8e3c8ee67 $/



So, is there any way I can resolve this netty conflict to make Flink work on
Yarn with MapR? Meanwhile, I am also raising this issue with MapR community.

Thanks in advance



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Netty-issues-while-deploying-Flink-with-Yarn-on-MapR-tp11411.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Mime
View raw message