spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: hadoopRDD stalls reading entire directory
Date Sun, 01 Jun 2014 05:57:49 GMT
First issue was because your cluster was configured incorrectly. You could
probably read 1 file because that was done on the driver node, but when it
tried to run a job on the cluster, it failed.

Second issue, it seems that the jar containing avro is not getting
propagated to the Executors. What version of Spark are you running on? What
deployment mode (YARN, standalone, Mesos)?


On Sat, May 31, 2014 at 9:37 PM, Russell Jurney <russell.jurney@gmail.com>
wrote:

> Now I get this:
>
> scala> rdd.first
>
> 14/05/31 21:36:28 INFO spark.SparkContext: Starting job: first at
> <console>:41
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job 4 (first at
> <console>:41) with 1 output partitions (allowLocal=true)
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final stage: Stage 4 (first
> at <console>:41)
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing parents: List()
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Computing the requested
> partition locally
>
> 14/05/31 21:36:28 INFO rdd.HadoopRDD: Input split:
> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864
>
> 14/05/31 21:36:28 INFO spark.SparkContext: Job finished: first at
> <console>:41, took 0.037371256 s
>
> 14/05/31 21:36:28 INFO spark.SparkContext: Starting job: first at
> <console>:41
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Got job 5 (first at
> <console>:41) with 16 output partitions (allowLocal=true)
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Final stage: Stage 5 (first
> at <console>:41)
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Missing parents: List()
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting Stage 5
> (HadoopRDD[0] at hadoopRDD at <console>:37), which has no missing parents
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Submitting 16 missing tasks
> from Stage 5 (HadoopRDD[0] at hadoopRDD at <console>:37)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0
> with 16 tasks
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:0 as
> TID 92 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:0 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:3 as
> TID 93 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:3 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:1 as
> TID 94 on executor 4: hivecluster4 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:1 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:2 as
> TID 95 on executor 0: hivecluster6.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:2 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:4 as
> TID 96 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:4 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:6 as
> TID 97 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:6 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:5 as
> TID 98 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:5 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:8 as
> TID 99 on executor 4: hivecluster4 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:8 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:7 as
> TID 100 on executor 0: hivecluster6.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:7 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:10 as
> TID 101 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:10 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:14 as
> TID 102 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:14 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:9 as
> TID 103 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:9 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:11 as
> TID 104 on executor 4: hivecluster4 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:11 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:12 as
> TID 105 on executor 0: hivecluster6.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:12 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:13 as
> TID 106 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:13 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:15 as
> TID 107 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:15 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 95 (task 5.0:2)
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException
>
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
>
> at
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>
> at java.lang.Class.forName0(Native Method)
>
> at java.lang.Class.forName(Class.java:270)
>
> at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
>
> at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>
> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>
> at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1483)
>
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1333)
>
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>
> at
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>
> at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>
> at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
>
> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
>
> at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:744)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:2 as
> TID 108 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:2 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 105 (task 5.0:12)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 1]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:12 as
> TID 109 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:12 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 100 (task 5.0:7)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 2]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:7 as
> TID 110 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:7 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 93 (task 5.0:3)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 3]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 92 (task 5.0:0)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 4]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:0 as
> TID 111 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:0 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:3 as
> TID 112 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:3 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 96 (task 5.0:4)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 5]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:4 as
> TID 113 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:4 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 102 (task 5.0:14)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 6]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 103 (task 5.0:9)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 7]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 98 (task 5.0:5)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 8]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 106 (task 5.0:13)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 9]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 101 (task 5.0:10)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 10]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:10 as
> TID 114 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:10 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:13 as
> TID 115 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:13 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:14 as
> TID 116 on executor 3: hivecluster1.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:14 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:9 as
> TID 117 on executor 4: hivecluster4 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:9 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 94 (task 5.0:1)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 11]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:1 as
> TID 118 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:1 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 97 (task 5.0:6)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 12]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:6 as
> TID 119 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:6 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:5 as
> TID 120 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:5 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 104 (task 5.0:11)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 13]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 99 (task 5.0:8)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 14]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:8 as
> TID 121 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:8 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:11 as
> TID 122 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:11 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 107 (task 5.0:15)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 15]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 109 (task 5.0:12)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 16]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:12 as
> TID 123 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:12 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:15 as
> TID 124 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:15 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 108 (task 5.0:2)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 17]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:2 as
> TID 125 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:2 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 111 (task 5.0:0)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 18]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:0 as
> TID 126 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:0 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 117 (task 5.0:9)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 19]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 113 (task 5.0:4)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 20]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 118 (task 5.0:1)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 21]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 119 (task 5.0:6)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 22]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 114 (task 5.0:10)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 23]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:6 as
> TID 127 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:6 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:4 as
> TID 128 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:4 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:9 as
> TID 129 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:9 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:1 as
> TID 130 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:1 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:10 as
> TID 131 on executor 4: hivecluster4 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:10 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 112 (task 5.0:3)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 24]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 115 (task 5.0:13)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 25]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 120 (task 5.0:5)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 26]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 110 (task 5.0:7)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 27]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 116 (task 5.0:14)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 28]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:14 as
> TID 132 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:14 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:3 as
> TID 133 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:3 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 121 (task 5.0:8)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 29]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 124 (task 5.0:15)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 30]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:15 as
> TID 134 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:15 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:7 as
> TID 135 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:7 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:5 as
> TID 136 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:5 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:8 as
> TID 137 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:8 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:13 as
> TID 138 on executor 0: hivecluster6.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:13 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 125 (task 5.0:2)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 31]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:2 as
> TID 139 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:2 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 123 (task 5.0:12)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 32]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:12 as
> TID 140 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:12 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 122 (task 5.0:11)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 33]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:11 as
> TID 141 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:11 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 126 (task 5.0:0)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 34]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:0 as
> TID 142 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:0 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 127 (task 5.0:6)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 35]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:6 as
> TID 143 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:6 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 138 (task 5.0:13)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 36]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:13 as
> TID 144 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:13 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 137 (task 5.0:8)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 37]
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 131 (task 5.0:10)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 38]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:8 as
> TID 145 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:8 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:10 as
> TID 146 on executor 4: hivecluster4 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:10 as
> 1294 bytes in 1 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 136 (task 5.0:5)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 39]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:5 as
> TID 147 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:5 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 133 (task 5.0:3)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 40]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:3 as
> TID 148 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:3 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 135 (task 5.0:7)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 41]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:7 as
> TID 149 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:7 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 132 (task 5.0:14)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 42]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:14 as
> TID 150 on executor 2: hivecluster3 (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:14 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 134 (task 5.0:15)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 43]
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Starting task 5.0:15 as
> TID 151 on executor 1: hivecluster5.labs.lan (NODE_LOCAL)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Serialized task 5.0:15 as
> 1294 bytes in 0 ms
>
> 14/05/31 21:36:28 WARN scheduler.TaskSetManager: Lost TID 142 (task 5.0:0)
>
> 14/05/31 21:36:28 INFO scheduler.TaskSetManager: Loss was due to
> java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
> [duplicate 44]
>
> 14/05/31 21:36:28 ERROR scheduler.TaskSetManager: Task 5.0:0 failed 4
> times; aborting job
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 5.0
> from pool
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 128 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.DAGScheduler: Failed to run first at
> <console>:41
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state RUNNING from TID 150 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state RUNNING from TID 151 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 146 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 129 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 130 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 147 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 144 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 141 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 140 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 150 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 151 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 148 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 145 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 149 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 143 because its task set is gone
>
> 14/05/31 21:36:28 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FAILED from TID 139 because its task set is gone
>
> org.apache.spark.SparkException: Job aborted: Task 5.0:0 failed 4 times
> (most recent failure: Exception failure: java.lang.ClassNotFoundException:
> org.apache.avro.mapred.AvroInputFormat)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>
> at scala.Option.foreach(Option.scala:236)
>
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
>
> On Sat, May 31, 2014 at 5:16 PM, Russell Jurney <russell.jurney@gmail.com>
> wrote:
>
>> I'm running the following code to load an entire directory of Avros using
>> hadoopRDD.
>>
>> val input = "hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/*"
>>
>> // Setup the path for the job vai a Hadoop JobConf
>> val jobConf= new JobConf(sc.hadoopConfiguration)
>> jobConf.setJobName("Test Scala Job")
>> FileInputFormat.setInputPaths(jobConf, input)
>>
>> val rdd = sc.hadoopRDD(
>>   jobConf,
>>   classOf[org.apache.avro.mapred.AvroInputFormat[GenericRecord]],
>>   classOf[org.apache.avro.mapred.AvroWrapper[GenericRecord]],
>>   classOf[org.apache.hadoop.io.NullWritable],
>>   1)
>>
>>
>> It successfully loads a single file, but when I load an entire directory,
>> I get this:
>>
>> scala> rdd.first
>>
>> 14/05/31 17:03:01 INFO mapred.FileInputFormat: Total input paths to
>> process : 17
>> 14/05/31 17:03:02 INFO spark.SparkContext: Starting job: first at
>> <console>:43
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Got job 0 (first at
>> <console>:43) with 1 output partitions (allowLocal=true)
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Final stage: Stage 0
>> (first at <console>:43)
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Parents of final stage:
>> List()
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Missing parents: List()
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Computing the requested
>> partition locally
>>
>> 14/05/31 17:03:02 INFO rdd.HadoopRDD: Input split:
>> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864
>>
>> 14/05/31 17:03:02 INFO spark.SparkContext: Job finished: first at
>> <console>:43, took 0.43242113 s
>>
>> 14/05/31 17:03:02 INFO spark.SparkContext: Starting job: first at
>> <console>:43
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Got job 1 (first at
>> <console>:43) with 16 output partitions (allowLocal=true)
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Final stage: Stage 1
>> (first at <console>:43)
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Parents of final stage:
>> List()
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Missing parents: List()
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Submitting Stage 1
>> (HadoopRDD[0] at hadoopRDD at <console>:40), which has no missing parents
>>
>> 14/05/31 17:03:02 INFO scheduler.DAGScheduler: Submitting 16 missing
>> tasks from Stage 1 (HadoopRDD[0] at hadoopRDD at <console>:40)
>>
>> 14/05/31 17:03:02 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0
>> with 16 tasks
>>
>> 14/05/31 17:03:17 WARN scheduler.TaskSchedulerImpl: Initial job has not
>> accepted any resources; check your cluster UI to ensure that workers are
>> registered and have sufficient memory
>>
>> 14/05/31 17:03:32 WARN scheduler.TaskSchedulerImpl: Initial job has not
>> accepted any resources; check your cluster UI to ensure that workers are
>> registered and have sufficient memory
>>
>> ...<many times>...
>>
>>
>> And never finishes. What should I do?
>> --
>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
>> com
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>

Mime
View raw message