hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Kozlov <ale...@cloudera.com>
Subject Re: Restricting number of records from map output
Date Fri, 14 Jan 2011 16:10:31 GMT
Hi Rakesh, What do you mean by the top N?  The first ones or you need to
sort them in memory?  You can always output records in the cleanup() method
at the end of the mapper run.

On Fri, Jan 14, 2011 at 7:05 AM, Hari Sreekumar <hsreekumar@clickable.com>wrote:

> Ideally, mappers should be independent of other mappers. Still, you can use
> counters and start skipping records when counter>some value to achieve
> similar behavior. It will not be very reliable if you want very exact
> results though.
>
> On Thu, Jan 13, 2011 at 12:43 AM, Anthony Urso <anthonyu@cs.ucla.edu>
> wrote:
>
> > Either use an instance variable or a Combiner.  The latter is correct
> > if you want the top-n per key from the mapper.
> >
> > On Wed, Jan 12, 2011 at 10:03 AM, Rakesh Davanum <rakeshdav@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > I have a sort job consisting of only the Mapper (no Reducer) task. I
> want
> > my
> > > results to contain only the top n records. Is there any way of
> > restricting
> > > the number of records that are emitted by the Mappers?
> > >
> > > Basically I am looking to see if there is an equivalent of achieving
> > > the behavior similar to LIMIT in SQL queries.
> > >
> > > Thanks & Regards,
> > > Rakesh
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message