hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <>
Subject RE: Is anybody working on the globally "order by" of hive ?
Date Fri, 11 Jun 2010 20:22:11 GMT
If someone is interested in adding parallel ORDER BY to Hive (using TotalOrderPartitioner),
here's a good starting point:

The goal would be to take that manual two-step sample-then-sort process and turn it into an
automatic plan within Hive.  I have a better example for the sampling query which I haven't
published yet.

We would also need to name the final output files in such a way that the total order could
be iterated via the filenames.


From: Ning Zhang []
Sent: Friday, June 11, 2010 12:40 PM
To: ''
Cc: ''
Subject: Re: Is anybody working on the globally "order by" of hive ?

Good idea Edward. It would definitely better if it is what it sounds to be.

Btw Jeff, order by is supported in trunk with certain limititions in strict mode (has to have
a limit). I will be able to update the wiki when I come back.

Sent from my blackberry

From: Edward Capriolo <>
To: <>
Cc: <>
Sent: Fri Jun 11 11:13:57 2010
Subject: Re: Is anybody working on the globally "order by" of hive ?

On Fri, Jun 11, 2010 at 5:24 AM, Jeff Zhang <<>>
Hi all,

>From the wiki of hive, Hive do not have the feature of globally "order
by", the sort by of hive is for each reducer. Our team think the
globally "order by" is an important feature for users, so wondering is
anybody working it ? I am very interested to been involved.

Best Regards

Jeff Zhang


I was wondering if TotalOrderPartitioner in hadoop 20 could play a role in this. As of now
order by sets reduce tasks to 1 :)


View raw message