hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: Set variables in mapper
Date Tue, 03 Aug 2010 15:34:17 GMT

On Aug 3, 2010, at 6:12 AM, Erik Test wrote:

> Really? This seems pretty nice.
>
> In the future, with your implementation, would the value always have  
> to be
> wrapped in a MyMapper instance? How would parameters be removed if
> necessary?

Sorry, I wasn't clear. I mean that if you make the sub-classes of  
Mapper serializable, the framework will serialize them for you and  
deserialize them on the cluster.

So a fuller example would look like:

public class MyMapper extends  
Mapper<IntWritable,Text,IntWritable,Text> implements Writable {
   int param;

   public MyMapper() { param = 0; }
   public MyMapper(int param) { this.param = param; }

   public void map(IntWritable key, Text value, Context context) {...}

   public void readFields(DataInputStream in) throws IOException {
     param = in.readInt();
   }

   public void write(DataOutputStream out) throws IOException {
      out.writeInt(param);
   }
}

You won't need to use Writable, you can use ProtocolBuffers, Thrift,  
or Avro. Where this comes in really handy is places like the  
InputFormats and OutputFormats. It enables you to replace the current:

job.setInputFormatClass(SequenceFileInputFormat.class);
FileInputFormat.setInputPath(job, inDir);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
FileOutputFormat.setOutputPath(job, outDir);

with the more natural:

job.setInputFormat(new SequenceFileInputFormat(inDir));
job.setOutputFormat(new SequenceFileOutputFormat(outDir));

Is that clearer now?

-- Owen

Mime
View raw message