spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niranda Perera <niranda.per...@gmail.com>
Subject Spark SQL sort by and collect by in multiple partitions
Date Thu, 03 Sep 2015 05:19:59 GMT
Hi all,

I have been using sort by and order by in spark sql and I observed the
following

when using SORT BY and collect results, the results are getting sorted
partition by partition.
example:
if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in
descending order,
partition 0 (p0) would have 12, 8, 4
p1 = 11, 7, 3
p2 = 10, 6, 2
p3 = 9, 5, 1

so collect() would return 12, 8, 4, 11, 7, 3, 10, 6, 2, 9, 5, 1

BUT when I use ORDER BY and collect results
p0 = 12, 11, 10
p1 =  9, 8, 7
.....
so collect() would return 12, 11, .., 1 which is the desirable result.

is this the intended behavior of SORT BY and ORDER BY or is there something
I'm missing?

cheers

-- 
Niranda
@n1r44 <https://twitter.com/N1R44>
https://pythagoreanscript.wordpress.com/

Mime
View raw message