hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravindra <ravindra.baj...@gmail.com>
Subject CombineInputFormat for mix of small and large files.
Date Fri, 24 Feb 2017 09:25:04 GMT
Hi All,

I have implemented CombineInputFormat for my job and it works well for
small files i.e. combine those to the block boundary. But there are few
very large file that it gets from the input source along with small files.
Hence the mapper that got to work on this large file becomes a laggard.

I had overwritten isSplitable to return false. I guess that was the reason
and hence I removed this overriding (i.e. allow hadoop to have default
behaviour on this). Hadoop splits the big files now, fine but then I see
inconsistency with the output records.

Is there anything related with my CustomRecordReader that I need to take
care of. Not sure.

Please advise!


View raw message