hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Binesh Gummadi <>
Subject Hive sort by using a single reducer
Date Sun, 02 Sep 2012 17:53:16 GMT
I am trying to insert data into a table after selecting and sorting by a
column. What I really want is order by a column and select the top million
rows. I am using Amazon EMR hive cloud to process data.
Here is my query

INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc LIMIT

It creates two jobs. First job run rather quickly and second job reducer is
running forever as it is running with a single reducer. Here is my question
on stackoverflow(

According to docs "order by" clause has a limitation of 1 reducer. Does
sort by has same limitation? Are there any other ways of solving the above

Binesh Gummadi

View raw message