spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Williams, Ken" <Ken.Willi...@windlogics.com>
Subject Problem connecting to HDFS in Spark shell
Date Mon, 21 Apr 2014 19:03:53 GMT
I'm trying to get my feet wet with Spark.  I've done some simple stuff in the shell in standalone
mode, and now I'm trying to connect to HDFS resources, but I'm running into a problem.

I synced to git's master branch (c399baa - "SPARK-1456 Remove view bounds on Ordered in favor
of a context bound on Ordering. (3 days ago) <Michael Armbrust>" and built like so:

    SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly

This created various jars in various places, including these (I think):

   ./examples/target/scala-2.10/spark-examples-assembly-1.0.0-SNAPSHOT.jar
   ./tools/target/scala-2.10/spark-tools-assembly-1.0.0-SNAPSHOT.jar
   ./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.2.0.jar

In `conf/spark-env.sh`, I added this (actually before I did the assembly):

    export HADOOP_CONF_DIR=/etc/hadoop/conf

Now I fire up the shell (bin/spark-shell) and try to grab data from HFDS, and get the following
exception:

scala> var hdf = sc.hadoopFile("hdfs:///user/kwilliams/dat/part-m-00000")
hdf: org.apache.spark.rdd.RDD[(Nothing, Nothing)] = HadoopRDD[0] at hadoopFile at <console>:12

scala> hdf.count()
java.lang.RuntimeException: java.lang.InstantiationException
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
        at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:209)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:207)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1064)
        at org.apache.spark.rdd.RDD.count(RDD.scala:806)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
        at $iwC$$iwC$$iwC.<init>(<console>:20)
        at $iwC$$iwC.<init>(<console>:22)
        at $iwC.<init>(<console>:24)
        at <init>(<console>:26)
        at .<init>(<console>:30)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:777)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1045)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
        at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
        at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.lang.InstantiationException
        at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
        ... 41 more


Is this recognizable to anyone as a build problem, or a config problem, or anything?  Failing
that, any way to get more information about where in the process it's failing?

Thanks.

--
Ken Williams, Senior Research Scientist
WindLogics
http://windlogics.com



________________________________

CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized review, use, disclosure
or distribution of any kind is strictly prohibited. If you are not the intended recipient,
please contact the sender via reply e-mail and destroy all copies of the original message.
Thank you.

Mime
View raw message