hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: how to find top N values using map-reduce ?
Date Sat, 02 Feb 2013 18:05:47 GMT
Note that a "one reducer" isn't always the solution. If you know your
key space boundaries, consider using a total-order-partition to scale
the app/job and make use of nodes on the cluster.

On Sat, Feb 2, 2013 at 10:35 AM, praveenesh kumar <praveenesh@gmail.com> wrote:
> I am looking for a better solution for this.
>
> 1 way to do this would be to find top N values from each mappers and
> then find out the top N out of them in 1 reducer.  I am afraid that
> this won't work effectively if my N is larger than number of values in
> my inputsplit (or mapper input).
>
> Otherway is to just sort all of them in 1 reducer and then do the cat of top-N.
>
> Wondering if there is any better approach to do this ?
>
> Regards
> Praveenesh



--
Harsh J

Mime
View raw message