hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hive sort by using a single reducer
Date Sun, 02 Sep 2012 20:01:00 GMT
Sort by does not have the single reduce restriction. Not sure which rank
you are using but any one should allow you to sort and rank if the query is
written correctly. Our implementation on my
github.com/edwardcaprioloallows this.
On Sunday, September 2, 2012, Binesh Gummadi <binesh.gummadi@gmail.com>
wrote:
> I am trying to insert data into a table after selecting and sorting by a
column. What I really want is order by a column and select the top million
rows. I am using Amazon EMR hive cloud to process data.
> Here is my query
> INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc
LIMIT 1000000;
> It creates two jobs. First job run rather quickly and second job reducer
is running forever as it is running with a single reducer. Here is my
question on stackoverflow(
http://stackoverflow.com/questions/12233343/why-is-sort-by-always-using-single-reducer
).
> According to docs "order by" clause has a limitation of 1 reducer. Does
sort by has same limitation? Are there any other ways of solving the above
requirement?
> ________________________________
> Binesh Gummadi
>
>

Mime
View raw message