hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Helgen <hhel...@gmail.com>
Subject reduce stop after n records
Date Thu, 08 Mar 2012 23:02:20 GMT
I am using hadoop 0.20.2 mapreduce API. The program is running fine, just
slower than it could.

I sum values and then use
job.setSortComparatorClass(LongWritable.DecreasingComparator.class) to sort
descending by sum. I need to stop the reducer after outputting the first N
records. This would save the reducer from running over thousands of records
when it only needs the first few records. Is there a solution with the new
mapreduce 0.20.2 API?

-------------------------------------------------------------------
I notice messages from 2008 about this topic:
http://grokbase.com/t/hadoop/common-user/089420wvkx/stop-mr-jobs-after-n-records-have-been-produced

https://issues.apache.org/jira/browse/HADOOP-3973

The last statement follows,  but the link is broken.
"You could do this pretty easily by implementing a custom MapRunnable.
There is no equivalent for reduces. The interface proposed in
HADOOP-1230 would support that kind of application. See:
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/
hadoop/mapreduce/
Look at the new Mapper and Reducer interfaces."

Mime
View raw message