hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From max scalf <oracle.bl...@gmail.com>
Subject sorting in hive -- general
Date Sat, 07 Mar 2015 23:02:52 GMT
Hello all,

I am a new to hadoop and hive in general and i am reading "hadoop the
definitive guide" by Tom White and on page 504 for the hive chapter, Tom
says below with regards to soritng

*Sorting and Aggregating*
*Sorting data in Hive can be achieved by using a standard ORDER BY clause.
ORDER BY performs a parallel total sort of the input (like that described
in “Total Sort” on page 261). When a globally sorted result is not
required—and in many cases it isn’t—you can use Hive’s nonstandard
extension, SORT BY, instead. SORT BY produces a sorted file per reducer.*

My Questions is, what exactly does he mean by "globally sorted result"?, if
the sort by operation produces a sorted file per reducer does that mean at
the end of the sort all the reducer are put back together to give the
correct results ?

View raw message