hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Deprecated ... damaged?
Date Wed, 15 Dec 2010 19:09:54 GMT
Actually, I just realized that numSplits can't be modified "definitely". Even if I write numSplits
= 5, it's just a hint. 

Then how come MultiFileInputFormat claims to use MultiFileSplit to contain one file/split
?? or is that also just a hint?

Maha

On Dec 15, 2010, at 2:13 AM, maha wrote:

> Hi everyone,
> 
>  Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is supposed to put
each file from the input directory in a SEPARATE split. So the number of Maps is equal to
the number of input files. Yet, what I get is that each split contains multiple paths of input
files, hence # of maps is < # of input files. Is it because "MultiFileInputFormat" is deprecated?
> 
>  In my implemented myMultiFileInputFormat I have only the following:
> 
> public RecordReader<LongWritable, Text> getRecordReader(InputSplit split, JobConf
job, Reporter reporter){
> 		return (new myRecordReader((MultiFileSplit) split));
> 	}
> 
> Yet, in myRecordReader, for example one split has the following;
> 
>  " /tmp/input/file1:0+300
>    /tmp/input/file2:0+199  "
> 
>  instead of each line in its own split.
> 
>    Why? Any clues?
> 
>          Thank you,
>              Maha


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message