hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Test <erik.shi...@gmail.com>
Subject Re: Set variables in mapper
Date Tue, 03 Aug 2010 21:14:49 GMT
O ok. Yes this is clear now. Thanks for the explanation
Erik


On 3 August 2010 11:34, Owen O'Malley <omalley@apache.org> wrote:

>
> On Aug 3, 2010, at 6:12 AM, Erik Test wrote:
>
>  Really? This seems pretty nice.
>>
>> In the future, with your implementation, would the value always have to be
>> wrapped in a MyMapper instance? How would parameters be removed if
>> necessary?
>>
>
> Sorry, I wasn't clear. I mean that if you make the sub-classes of Mapper
> serializable, the framework will serialize them for you and deserialize them
> on the cluster.
>
> So a fuller example would look like:
>
> public class MyMapper extends Mapper<IntWritable,Text,IntWritable,Text>
> implements Writable {
>  int param;
>
>  public MyMapper() { param = 0; }
>  public MyMapper(int param) { this.param = param; }
>
>  public void map(IntWritable key, Text value, Context context) {...}
>
>  public void readFields(DataInputStream in) throws IOException {
>    param = in.readInt();
>  }
>
>  public void write(DataOutputStream out) throws IOException {
>     out.writeInt(param);
>  }
> }
>
> You won't need to use Writable, you can use ProtocolBuffers, Thrift, or
> Avro. Where this comes in really handy is places like the InputFormats and
> OutputFormats. It enables you to replace the current:
>
> job.setInputFormatClass(SequenceFileInputFormat.class);
> FileInputFormat.setInputPath(job, inDir);
> job.setOutputFormatClass(SequenceFileOutputFormat.class);
> FileOutputFormat.setOutputPath(job, outDir);
>
> with the more natural:
>
> job.setInputFormat(new SequenceFileInputFormat(inDir));
> job.setOutputFormat(new SequenceFileOutputFormat(outDir));
>
> Is that clearer now?
>
> -- Owen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message