hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Murthy <ra...@facebook.com>
Subject Re: Sorting data
Date Thu, 26 Mar 2009 18:10:55 GMT
Right now there is already a way to get total ordering. You can do a SORT BY
and specify one reducer.

raghu

On 3/26/09 10:49 AM, "Jeff Hammerbacher" <hammer@cloudera.com> wrote:

> Hey Zheng,
> 
> What is the timeline and priority for doing a total ordering for ORDER BY
> support?
> 
> Thanks,
> Jeff
> 
> On Wed, Mar 25, 2009 at 9:02 PM, Suhail Doshi <digitalwarfare@gmail.com>
> wrote:
>> Ah okay, I guess I can simply just not do fetchAll() to grab the global ten
>> so 
>> I do not mistakenly grab too much data.
>> 
>> Suhail
>> 
>> 
>> On Wed, Mar 25, 2009 at 6:43 PM, Zheng Shao <zshao9@gmail.com> wrote:
>>> There is a SORT BY.
>>> 
>>> You can do:
>>> SELECT * FROM tableA SORT BY c1 DESC;
>>> 
>>> Then each of the partition will be sorted.
>>> 
>>> However in order to get the global 10, we will need to do LIMIT 10 on top of
>>> that. LIMIT 10 and SORT BY do not work exactly as the user wants now.
>>> 
>>> 
>>> Zheng
>>> 
>>> 
>>> On Wed, Mar 25, 2009 at 3:23 PM, Suhail Doshi <digitalwarfare@gmail.com>
>>> wrote:
>>>> Since Hive does not have an ORDER BY...yet what is the solution for getting
>>>> the top 10 rows based on a field without having your client in thrift
>>>> getting too much data back? Seems like it is possible to actually get too
>>>> much data but unfortunately you have to get all rows and sort by yourself.
>>>> 
>>>> Suhail
>>> 
>>> 
>>> 
>>> -- 
>>> Yours,
>>> Zheng
>> 
>> 
>> 
>> -- 
>> http://mixpanel.com
>> Blog: http://blog.mixpanel.com
> 


Mime
View raw message