hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Deprecated ... damaged?
Date Wed, 15 Dec 2010 10:13:59 GMT
Hi everyone,

  Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is supposed to put each
file from the input directory in a SEPARATE split. So the number of Maps is equal to the number
of input files. Yet, what I get is that each split contains multiple paths of input files,
hence # of maps is < # of input files. Is it because "MultiFileInputFormat" is deprecated?

  In my implemented myMultiFileInputFormat I have only the following:

public RecordReader<LongWritable, Text> getRecordReader(InputSplit split, JobConf job,
Reporter reporter){
		return (new myRecordReader((MultiFileSplit) split));
	}

Yet, in myRecordReader, for example one split has the following;
  
  " /tmp/input/file1:0+300
    /tmp/input/file2:0+199  "

  instead of each line in its own split.

    Why? Any clues?

          Thank you,
              Maha
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message