hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Tabora <ratab...@gmail.com>
Subject Re: WholeFileInputFormat in hadoop
Date Sun, 29 Jun 2014 17:57:05 GMT
Try reading this blog here, I think it's a pretty good overview.

*http://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/
<http://hadoopi.wordpress.com/2013/05/27/understand-recordreader-inputsplit/>*


If you set a whole file's contents to be either the key or the value in the
mapper, yes you will load the whole file in memory. This is why it is up to
the user to define what a key/value pair is in the input format. You could
always set the key value pair to some metadata about the file (file path,
file length) if you don't want to load the whole thing in the mapper.

Regards,
Ryan Tabora
http://ryantabora.com


On Sun, Jun 29, 2014 at 9:28 AM, unmesha sreeveni <unmeshabiju@gmail.com>
wrote:

> But how is it different from normal execution and parallel MR.
> Although mapreduce is a parallel exec framework where the data into map is
>  a single input.
>
> If the Whole fileinput is jst an entire input split insead of the entire
> input file . it will be useful right?
> if it is the whole file it can caught heapspace ..
>
> Please correct me if I am wrong.
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Mime
View raw message