hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Helgen <hhel...@gmail.com>
Subject reduce stop after n records
Date Thu, 08 Mar 2012 23:02:20 GMT
I am using hadoop 0.20.2 mapreduce API. The program is running fine, just
slower than it could.

I sum values and then use
job.setSortComparatorClass(LongWritable.DecreasingComparator.class) to sort
descending by sum. I need to stop the reducer after outputting the first N
records. This would save the reducer from running over thousands of records
when it only needs the first few records. Is there a solution with the new
mapreduce 0.20.2 API?

I notice messages from 2008 about this topic:


The last statement follows,  but the link is broken.
"You could do this pretty easily by implementing a custom MapRunnable.
There is no equivalent for reduces. The interface proposed in
HADOOP-1230 would support that kind of application. See:
Look at the new Mapper and Reducer interfaces."

View raw message