hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Parker <michael.g.par...@gmail.com>
Subject Re: Can't access filename in mapper?
Date Thu, 14 Jun 2012 06:59:05 GMT
Thanks for the prompt reply, this worked like a charm!

- Mike

On Wed, Jun 13, 2012 at 10:51 PM, Harsh J <harsh@cloudera.com> wrote:
> Hey Mike,
> There is a much easier way to do this. We've answered a very similar
> question in detail before at: http://search-hadoop.com/m/ZOmmJ1PZJqt1
> (Question has a way for the stable/old API, and my response has the
> way for new API). Does this help?
> On Thu, Jun 14, 2012 at 8:24 AM, Michael Parker
> <michael.g.parker@gmail.com> wrote:
>> Hi all,
>> I'm new to Hadoop MR and decided to make a go at using only the new
>> API. I have a series of log files (who doesn't?), where a different
>> date is encoded in each filename. The log files are so few that I'm
>> not using HDFS. In my main method, I accept the input directory
>> containing all the log files as the first command line argument:
>>  Configuration conf = new Configuration();
>>  String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
>>  Path inputDir = new Path(otherArgs[0]);
>>  ...
>>  Job job1 = new Job(conf, "job1");
>>  FileInputFormat.addInputPath(job1, inputDir);
>> I actually have two jobs chained using a JobControl, but I think
>> that's irrelevant. The problem is that the Mapper of this job cannot
>> get the filename by accessing key "mapred.input.file" of the Context
>> object that is either passed to the setup method of the mapper, or
>> available through the Context object in the call to map. Dumping the
>> configuration like so:
>>  StringWriter writer = new StringWriter();
>>  Configuration.dumpConfiguration(context.getConfiguration(), writer);
>>  System.out.println("configuration=" + writer.toString());
>> Reveals that there is a "mapred.input.dir" key that contains the path
>> passed as a command line argument and assigned to inputDir in my main
>> method, but the processed filename within that path is still
>> inaccessible. Any ideas how to get this?
>> Thanks,
>> Mike
> --
> Harsh J

View raw message