hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravindra <ravindra.baj...@gmail.com>
Subject Re: CombineInputFormat for mix of small and large files.
Date Fri, 24 Feb 2017 09:28:53 GMT
Also to add that my test input file has record less than what I see as
going to mappers (i.e. Map Input Records). and the input file is more than
double the size of block size.

On Fri, Feb 24, 2017 at 4:25 PM Ravindra <ravindra.bajpai@gmail.com> wrote:

> Hi All,
> I have implemented CombineInputFormat for my job and it works well for
> small files i.e. combine those to the block boundary. But there are few
> very large file that it gets from the input source along with small files.
> Hence the mapper that got to work on this large file becomes a laggard.
> I had overwritten isSplitable to return false. I guess that was the reason
> and hence I removed this overriding (i.e. allow hadoop to have default
> behaviour on this). Hadoop splits the big files now, fine but then I see
> inconsistency with the output records.
> Is there anything related with my CustomRecordReader that I need to take
> care of. Not sure.
> Please advise!
> Thanks.

View raw message