spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhu <ma...@madhu.com>
Subject Sorting partitions in Java
Date Tue, 20 May 2014 13:25:33 GMT
I'm trying to sort data in each partition of an RDD.
I was able to do it successfully in Scala like this:

val sorted = rdd.mapPartitions(iter => {
  iter.toArray.sortWith((x, y) => x._2.compare(y._2) < 0).iterator
},
preservesPartitioning = true)

I used the same technique as in OrderedRDDFunctions.scala, so I assume it's
a reasonable way to do it.

This works well so far, but I can't seem to do the same thing in Java
because 'iter' in the Java APIs is an Iterator rather than an Iterable.
There may be an unattractive workaround, but I didn't pursue it.

Ideally, it would be nice to have an efficient, robust method in RDD to sort
each partition.
Does something like that exist?

Thanks!



-----
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-partitions-in-Java-tp6715.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Mime
View raw message