hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Healy <the...@bnl.gov>
Subject Re: One file per mapper?
Date Mon, 08 Oct 2012 15:29:50 GMT
thanks Bejoy.

...Feeling a bit foolish as Tom White's book was 2 feet away....

On 10/08/2012 10:28 AM, Bejoy Ks wrote:
> Hi Terry
> 
> If you are having files smaller than hdfs block size and if you are
> using Default TextInputFormat with the default properties for split
> sizes there would be just one file per mapper.
> 
> If you are having larger file sizes, greater than the size of a hdfs
> block. Please take a look at a sample implemention of
> 'WholeFileInputFormat' from 'Hadoop - The Definitive Guide' by Tom White.
> http://books.google.co.in/books?id=Nff49D7vnJcC&pg=PA206&lpg=PA206&dq=wholefileinputformat&source=bl&ots=IifzWlbwQs&sig=9CDmS45S8pGDOaCYl6xGXnyDFE8&hl=en&sa=X&ei=VeJyUKfEE4rMrQe654G4DA&ved=0CCsQ6AEwAg#v=onepage&q=wholefileinputformat&f=false
> 
> 
> 
> On Mon, Oct 8, 2012 at 7:51 PM, Terry Healy <thealy@bnl.gov
> <mailto:thealy@bnl.gov>> wrote:
> 
>     Hello-
> 
>     I know that it is contrary to normal Hadoop operation, but how can I
>     configure my M/R job to send one complete file to each mapper task? This
>     is intended to be used on many files in the 1.5 MB range as the first
>     step in a chain of processes.
> 
>     thanks.
> 
> 

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973

Mime
View raw message