hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Decimus Phostle <decimusphos...@gmail.com>
Subject Re: Using MultipleTextOutputFormat to control output filename in MapReduce
Date Tue, 23 Aug 2011 23:20:18 GMT
This was answered on SO(at http://goo.gl/FX7QN) - replying here in
case someone has a similar question at a future date.

On Mon, Aug 8, 2011 at 12:06 PM, Decimus Phostle
<decimusphostle@gmail.com> wrote:
> Hello Folks,
>
> I needed some help with using MultipleTextOutputFormat to control the
> output filename in MapReduce.
>
> Currently I am using it as shown below(or at
> http://pastebin.com/gJxkdwRd). And it seems to work fine. However what
> I am trying to change is the usage of the fields that get picked to
> determine the filename.
>
> Instead of hardcoding them to field[0] or field[3](as is the case in
> the sample), I would like to pick this up (in some dynamic fashion)
> from say, JobConf as field[jobConf.get("id.offset")] or
> field[jobConf[get("date.offset")]. Does anyone here know how I could
> go about doing this (or something to this effect i.e. it doesn't have
> to be JobConf per se)?
>
> Any pointers/suggestions/tips et al. would be most appreciated. Thanks.
>
> PS:
>
> 1) Current usage of MultipleTextOutputFormat:
>
> public class FooBarMultipleTextOutputFormat
> extends MultipleTextOutputFormat<NullWritable, Text> {
>
>        protected String generateFileNameForKeyValue(NullWritable key,
>                                                          
                      Text value,
>                                                          
                      String name) {
>                String line = value.toString();
>
>                //TODO: I would like to parameterize the field that is picked
>                //here. Something akin to using JobConf in Mapper.
>                //i.e. instead of hard-coding [3] or [0], I would like
>                //to get it from JobConf(or some other configuration) in some
fashion
>
>                String date = (line.split("\t"))[3].substring(0,10);
>                String id = (line.split("\t"))[0];
>
>                String partitionNumber = String.format("%05d", ID.getPartitionNumber(id));
>
>                return date + "/pn_" + partitionNumber;
>        }
> }
>
> 2) This is also an SO question here: http://goo.gl/FX7QN, if you would
> rather answer it there. TIA.
>

Mime
View raw message