hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: reduce stop after n records
Date Fri, 09 Mar 2012 10:14:52 GMT
Hello Henry,

Per the older conversation, what Owen was pointing to were the new API
Mapper/Reducer classes, and its run(…) method override specifically:

You'll need to port your job to the new (still a bit unstable) API to
leverage this. Here are some slides to aid you in that task:
http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (The
first part, from Owen).

On Fri, Mar 9, 2012 at 4:32 AM, Henry Helgen <hhelgen@gmail.com> wrote:

> I am using hadoop 0.20.2 mapreduce API. The program is running fine, just
> slower than it could.
> I sum values and then use
> job.setSortComparatorClass(LongWritable.DecreasingComparator.class) to sort
> descending by sum. I need to stop the reducer after outputting the first N
> records. This would save the reducer from running over thousands of records
> when it only needs the first few records. Is there a solution with the new
> mapreduce 0.20.2 API?
> -------------------------------------------------------------------
> I notice messages from 2008 about this topic:
> http://grokbase.com/t/hadoop/common-user/089420wvkx/stop-mr-jobs-after-n-records-have-been-produced
> https://issues.apache.org/jira/browse/HADOOP-3973
> The last statement follows,  but the link is broken.
> "You could do this pretty easily by implementing a custom MapRunnable.
> There is no equivalent for reduces. The interface proposed in
> HADOOP-1230 would support that kind of application. See:
> http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/
> hadoop/mapreduce/
> Look at the new Mapper and Reducer interfaces."

Harsh J

View raw message