hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: One file per mapper?
Date Mon, 08 Oct 2012 14:28:56 GMT
Hi Terry

If you are having files smaller than hdfs block size and if you are using
Default TextInputFormat with the default properties for split sizes there
would be just one file per mapper.

If you are having larger file sizes, greater than the size of a hdfs block.
Please take a look at a sample implemention of 'WholeFileInputFormat' from
'Hadoop - The Definitive Guide' by Tom White.
http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false



On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <thealy@bnl.gov> wrote:

> Hello-
>
> I know that it is contrary to normal Hadoop operation, but how can I
> configure my M/R job to send one complete file to each mapper task? This
> is intended to be used on many files in the 1.5 MB range as the first
> step in a chain of processes.
>
> thanks.
>

Mime
View raw message