hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Decimus Phostle <decimusphos...@gmail.com>
Subject Using MultipleTextOutputFormat to control output filename in MapReduce
Date Mon, 08 Aug 2011 16:06:42 GMT
Hello Folks,

I needed some help with using MultipleTextOutputFormat to control the
output filename in MapReduce.

Currently I am using it as shown below(or at
http://pastebin.com/gJxkdwRd). And it seems to work fine. However what
I am trying to change is the usage of the fields that get picked to
determine the filename.

Instead of hardcoding them to field[0] or field[3](as is the case in
the sample), I would like to pick this up (in some dynamic fashion)
from say, JobConf as field[jobConf.get("id.offset")] or
field[jobConf[get("date.offset")]. Does anyone here know how I could
go about doing this (or something to this effect i.e. it doesn't have
to be JobConf per se)?

Any pointers/suggestions/tips et al. would be most appreciated. Thanks.


1) Current usage of MultipleTextOutputFormat:

public class FooBarMultipleTextOutputFormat
extends MultipleTextOutputFormat<NullWritable, Text> {
	protected String generateFileNameForKeyValue(NullWritable key,
										 Text value,
										 String name) {
		String line = value.toString();

		//TODO: I would like to parameterize the field that is picked
		//here. Something akin to using JobConf in Mapper.
		//i.e. instead of hard-coding [3] or [0], I would like
		//to get it from JobConf(or some other configuration) in some fashion

		String date = (line.split("\t"))[3].substring(0,10);
		String id = (line.split("\t"))[0];

		String partitionNumber = String.format("%05d", ID.getPartitionNumber(id));

		return date + "/pn_" + partitionNumber;

2) This is also an SO question here: http://goo.gl/FX7QN, if you would
rather answer it there. TIA.

View raw message