spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pulkit Bhuwalka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3274) Spark Streaming Java API reports java.lang.ClassCastException when calling collectAsMap on JavaPairDStream
Date Sun, 28 Sep 2014 21:41:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151233#comment-14151233
] 

Pulkit Bhuwalka commented on SPARK-3274:
----------------------------------------

        SparkConf sparkConf = new SparkConf().setAppName("Page Rank").setMaster("local[4]");
        JavaSparkContext context = new JavaSparkContext(sparkConf);

        JavaPairRDD<String, String> transformedLinkMap =
                context.sequenceFile(pageRankOptions.getFileLocation(), String.class, String.class,
1)
                .mapToPair(new PairFunction<Tuple2<String, String>, String, String>()
{
                    @Override
                    public Tuple2<String, String> call(Tuple2<String, String>
urlAndLinks) throws Exception {
//                        return new Tuple2<String, String>(urlAndLinks._1(), urlAndLinks._2());
                        return new Tuple2<String, String>(
                                urlAndLinks._1(),
                                new LinkDetails(1.0, new LinkParser().parse(urlAndLinks._2())).toString()
                        );
                    }
                });

When I use the commented line above, which simply returns the strings, it works. However,
when I use the code after that with LinkDetails which simply parses the string into an object,
the code fails with a ClassCastException.

java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to java.lang.String
	at io.pulkit.cmu.acc.project1.phase2.PageRankSparkJob$1.call(PageRankSparkJob.java:28)
	at io.pulkit.cmu.acc.project1.phase2.PageRankSparkJob$1.call(PageRankSparkJob.java:24)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:926)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:926)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1167)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
	at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
	at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:695)

I looked at the other link mentioned. However, the pull request link on that does not work
and it is marked as resolved in 0.9. However, I'm using 1.1.0.

Thanks a lot.



> Spark Streaming Java API reports java.lang.ClassCastException when calling collectAsMap
on JavaPairDStream
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3274
>                 URL: https://issues.apache.org/jira/browse/SPARK-3274
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.0.2
>            Reporter: Jack Hu
>
> Reproduce code:
> scontext
> 		.socketTextStream("localhost", 18888)
> 		.mapToPair(new PairFunction<String, String, String>(){
> 			public Tuple2<String, String> call(String arg0)
> 					throws Exception {
> 				return new Tuple2<String, String>("1", arg0);
> 			}
> 		})
> 		.foreachRDD(new Function2<JavaPairRDD<String, String>, Time, Void>() {
> 			public Void call(JavaPairRDD<String, String> v1, Time v2) throws Exception {
> 				System.out.println(v2.toString() + ": " + v1.collectAsMap().toString());
> 				return null;
> 			}
> 		});
> Exception:
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lscala.Tupl
> e2;
>         at org.apache.spark.rdd.PairRDDFunctions.collectAsMap(PairRDDFunctions.s
> cala:447)
>         at org.apache.spark.api.java.JavaPairRDD.collectAsMap(JavaPairRDD.scala:
> 464)
>         at tuk.usecase.failedcall.FailedCall$1.call(FailedCall.java:90)
>         at tuk.usecase.failedcall.FailedCall$1.call(FailedCall.java:88)
>         at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachR
> DD$2.apply(JavaDStreamLike.scala:282)
>         at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachR
> DD$2.apply(JavaDStreamLike.scala:282)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mc
> V$sp(ForEachDStream.scala:41)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(Fo
> rEachDStream.scala:40)
>         at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(Fo
> rEachDStream.scala:40)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
>         at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message