hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: Batching key/value pairs to map
Date Tue, 24 Feb 2009 00:16:11 GMT

On Feb 23, 2009, at 2:19 PM, Jimmy Wan wrote:

>  I'm not sure if this is
> possible, but it would certainly be nice to either:
> 1) pass the OutputCollector and Reporter to the close() method.
> 2) Provide accessors to the OutputCollector and the Reporter.

If you look at the 0.20 branch, which hasn't released yet, there is a  
new map/reduce api. That api does provide a lot more control. Take a  
look at Mapper, which provide setup, map, and cleanup hooks:

http://tinyurl.com/bquvxq

The map method looks like:

   /**
    * Called once for each key/value pair in the input split. Most  
applications
    * should override this, but the default is the identity function.
    */
   @SuppressWarnings("unchecked")
   protected void map(KEYIN key, VALUEIN value,
                                       Context context) throws  
IOException, InterruptedException {
     context.write((KEYOUT) key, (VALUEOUT) value);
   }

But there is also a run method that drives the task. The default is  
given below, but it can be overridden by the application.

   /**
    * Expert users can override this method for more complete control  
over the
    * execution of the Mapper.
    * @param context
    * @throws IOException
    */
   public void run(Context context) throws IOException,  
InterruptedException {
     setup(context);
     while (context.nextKeyValue()) {
       map(context.getCurrentKey(), context.getCurrentValue(), context);
     }
     cleanup(context);
   }

Clearly, in your application you could override run to make a list of  
100 key, value pairs or something.

-- Owen

Mime
View raw message