hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "luo Yi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
Date Fri, 14 May 2010 05:08:43 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867397#action_12867397
] 

luo Yi commented on MAPREDUCE-1743:
-----------------------------------

the following code may get the true file name from the TaggedInputSplit. because TaggedInputSplit
is a hadoop inner class ,you should make your class in the org.apache.hadoop.mapred.lib classspace:

{code:title=TaggedInputSplitGetName.java|borderStyle=solid}
InputSplit is = reporter.getInputSplit();
String name = is.getClass().getName();
if ( name.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) {
    FileSplit fs = (FileSplit)is;
    String path = fs.getPath().toString();
    word.set(path);
    output.collect(word, one);
}
if ( name.compareTo("org.apache.hadoop.mapred.lib.TaggedInputSplit") == 0 ) {
    TaggedInputSplit tis = (TaggedInputSplit)is;
    InputSplit iis = tis.getInputSplit();
    String iname = iis.getClass().getName();
    word.set(iname);
    output.collect(word, one);
    if ( iname.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) {
        FileSplit fs = (FileSplit)iis;
       // the path from the TaggedInputSplit should be prefixed by "convert: "
        String path = "convert: " + fs.getPath().toString();
        word.set(path);
        output.collect(word, one);
    }
}

and the output file give me : 

{noformat}
$ grep 'convert' testout/part-00000 |head -n 5
convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000000_0    1
convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000001_0    1
convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000002_0    1
convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000003_0    1
convert: hdfs://myowndir/pt=20100513000000/attempt_201003291206_327196_r_000004_0    1
{noformat} 

you may give it a try.

{code} 

> conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1743
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: Yuanyuan Tian
>
> There is a problem in getting the input file name in the mapper when uisng MultipleInputs
in Hadoop 0.20. I need to use MultipleInputs to support different formats for my inputs to
the my MapReduce job. And inside each mapper, I also need to know the exact input file that
the mapper is processing. However, conf.get("map.input.file") returns null. Can anybody help
me solve this problem? Thanks in advance.
> public class Test extends Configured implements Tool{
> 	static class InnerMapper extends MapReduceBase implements Mapper<Writable, Writable,
NullWritable, Text>
> 	{
> 		................
> 		................
> 		public void configure(JobConf conf)
> 		{	
> 			String inputName=conf.get("map.input.file"));
> 			.......................................
> 		}
> 		
> 	}
> 	
> 	public int run(String[] arg0) throws Exception {
> 		JonConf job;
> 		job = new JobConf(Test.class);
> 		...........................................
> 		
> 		MultipleInputs.addInputPath(conf, new Path("A"), TextInputFormat.class);
> 		MultipleInputs.addInputPath(conf, new Path("B"), SequenceFileFormat.class);
> 		...........................................
> 	}
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message