spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rachana Srivastava <Rachana.Srivast...@markmonitor.com>
Subject yarn-cluster mode throwing NullPointerException
Date Mon, 12 Oct 2015 03:49:52 GMT
I am trying to submit a job using yarn-cluster mode using spark-submit command.  My code works
fine when I use yarn-client mode.

Cloudera Version:
CDH-5.4.7-1.cdh5.4.7.p0.3

Command Submitted:
spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming"  \
--driver-java-options "-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
\
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
\
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
\
--num-executors 2 \
--executor-cores 2 \
../target/mm-XXX-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar \
yarn-cluster 10 "XXX:2181" "XXX:9092" groups kafkaurl 5 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeature.properties"
\
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeatureContent.properties"
\
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/hdfsOutputNEWScript/OUTPUTYarn2"
 false


Log Details:
INFO : org.apache.spark.SparkContext - Running Spark version 1.3.0
INFO : org.apache.spark.SecurityManager - Changing view acls to: ec2-user
INFO : org.apache.spark.SecurityManager - Changing modify acls to: ec2-user
INFO : org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(ec2-user); users with modify permissions: Set(ec2-user)
INFO : akka.event.slf4j.Slf4jLogger - Slf4jLogger started
INFO : Remoting - Starting remoting
INFO : Remoting - Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-10-0-0-XXX.us-west-2.compute.internal:49579]
INFO : Remoting - Remoting now listens on addresses: [akka.tcp://sparkDriver@ip-10-0-0-XXX.us-west-2.compute.internal:49579]
INFO : org.apache.spark.util.Utils - Successfully started service 'sparkDriver' on port 49579.
INFO : org.apache.spark.SparkEnv - Registering MapOutputTracker
INFO : org.apache.spark.SparkEnv - Registering BlockManagerMaster
INFO : org.apache.spark.storage.DiskBlockManager - Created local directory at /tmp/spark-1c805495-c7c4-471d-973f-b1ae0e2c8ff9/blockmgr-fff1946f-a716-40fc-a62d-bacba5b17638
INFO : org.apache.spark.storage.MemoryStore - MemoryStore started with capacity 265.4 MB
INFO : org.apache.spark.HttpFileServer - HTTP File server directory is /tmp/spark-8ed6f513-854f-4ee4-95ea-87185364eeaf/httpd-75cee1e7-af7a-4c82-a9ff-a124ce7ca7ae
INFO : org.apache.spark.HttpServer - Starting HTTP Server
INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
INFO : org.spark-project.jetty.server.AbstractConnector - Started SocketConnector@0.0.0.0:46671
INFO : org.apache.spark.util.Utils - Successfully started service 'HTTP file server' on port
46671.
INFO : org.apache.spark.SparkEnv - Registering OutputCommitCoordinator
INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
INFO : org.spark-project.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:4040
INFO : org.apache.spark.util.Utils - Successfully started service 'SparkUI' on port 4040.
INFO : org.apache.spark.ui.SparkUI - Started SparkUI at http://ip-10-0-0-XXX.us-west-2.compute.internal:4040
INFO : org.apache.spark.SparkContext - Added JAR file:/home/ec2-user/CE/correlationengine/scripts/../target/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
at http://10.0.0.XXX:46671/jars/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
with timestamp 1444620509463
INFO : org.apache.spark.scheduler.cluster.YarnClusterScheduler - Created YarnClusterScheduler
ERROR: org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend - Application ID is
not set.
INFO : org.apache.spark.network.netty.NettyBlockTransferService - Server created on 33880
INFO : org.apache.spark.storage.BlockManagerMaster - Trying to register BlockManager
INFO : org.apache.spark.storage.BlockManagerMasterActor - Registering block manager ip-10-0-0-XXX.us-west-2.compute.internal:33880
with 265.4 MB RAM, BlockManagerId(<driver>, ip-10-0-0-XXX.us-west-2.compute.internal,
33880)
INFO : org.apache.spark.storage.BlockManagerMaster - Registered BlockManager
INFO : org.apache.spark.scheduler.EventLoggingListener - Logging events to hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/spark/applicationHistory/spark-application-1444620509497
Exception in thread "main" java.lang.NullPointerException
    at org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)
    at org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:541)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
    at com.markmonitor.antifraud.ce.KafkaURLStreaming.main(KafkaURLStreaming.java:91)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
WARN : org.apache.hadoop.hdfs.DFSClient - Unable to persist blocks in hflush for /user/spark/applicationHistory/spark-application-1444620509497.inprogress
java.io.IOException: Failed on local exception: java.io.InterruptedIOException: Interruped
while waiting for IO on channel java.nio.channels.SocketChannel[connected local=/10.0.0.XXX:43929
remote=ip-10-0-0-XXX.us-west-2.compute.internal/10.0.0.XXX:8020]. 59998 millis timeout left.;
Host Details : local host is: "ip-10-0-0-XXX.us-west-2.compute.internal/10.0.0.XXX"; destination
host is: "ip-10-0-0-XXX.us-west-2.compute.internal":8020;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy18.fsync(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.fsync(ClientNamenodeProtocolTranslatorPB.java:814)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy19.fsync(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2067)
    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959)
    at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener.onBlockManagerAdded(EventLoggingListener.scala:171)
    at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:46)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
    at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
Caused by: java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[connected
local=/10.0.0.XXX:43929 remote=ip-10-0-0-XXX.us-west-2.compute.internal/10.0.0.XXX:8020].
59998 millis timeout left.
    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:352)
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:513)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1071)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
WARN : org.apache.hadoop.hdfs.DFSClient - Error while syncing
java.nio.channels.ClosedChannelException
    at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1635)
    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2074)
    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959)
    at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener.onBlockManagerAdded(EventLoggingListener.scala:171)
    at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:46)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
    at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
ERROR: org.apache.spark.scheduler.LiveListenerBus - Listener EventLoggingListener threw an
exception
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener.onBlockManagerAdded(EventLoggingListener.scala:171)
    at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:46)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
    at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
Caused by: java.nio.channels.ClosedChannelException
    at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1635)
    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2074)
    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959)
    at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
    ... 19 more
ERROR: org.apache.spark.scheduler.LiveListenerBus - Listener EventLoggingListener threw an
exception
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:144)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:144)
    at org.apache.spark.scheduler.EventLoggingListener.onApplicationStart(EventLoggingListener.scala:177)
    at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:52)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
    at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53)
    at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
    at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60)
Caused by: java.io.IOException: Filesystem closed
    at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:794)
    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1998)
    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1959)
    at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
    ... 19 more
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/jars/avro-tools-1.7.6-cdh5.4.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Thanks,

Rachana

Mime
View raw message