hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Parker <michael.g.par...@gmail.com>
Subject Can't access filename in mapper?
Date Thu, 14 Jun 2012 02:54:45 GMT
Hi all,

I'm new to Hadoop MR and decided to make a go at using only the new
API. I have a series of log files (who doesn't?), where a different
date is encoded in each filename. The log files are so few that I'm
not using HDFS. In my main method, I accept the input directory
containing all the log files as the first command line argument:

  Configuration conf = new Configuration();
  String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
  Path inputDir = new Path(otherArgs[0]);
  ...
  Job job1 = new Job(conf, "job1");
  FileInputFormat.addInputPath(job1, inputDir);

I actually have two jobs chained using a JobControl, but I think
that's irrelevant. The problem is that the Mapper of this job cannot
get the filename by accessing key "mapred.input.file" of the Context
object that is either passed to the setup method of the mapper, or
available through the Context object in the call to map. Dumping the
configuration like so:

  StringWriter writer = new StringWriter();
  Configuration.dumpConfiguration(context.getConfiguration(), writer);
  System.out.println("configuration=" + writer.toString());

Reveals that there is a "mapred.input.dir" key that contains the path
passed as a command line argument and assigned to inputDir in my main
method, but the processed filename within that path is still
inaccessible. Any ideas how to get this?

Thanks,
Mike

Mime
View raw message