hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj K Singh <rajkrrsi...@gmail.com>
Subject Re: Are mapper classes re-instantiated for each record?
Date Tue, 06 May 2014 07:47:04 GMT
point 2 is right,The framework first calls setup() followed by map() for
each key/value pair in the InputSplit. Finally cleanup() is called
irrespective of no of records in the input split.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <sergeymurylev@gmail.com>wrote:

>  Hi Jeremy,
>
> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup
and cleanup calls performed for each InputSplit. In this case you
> variant 2 is more correct. But actually single mapper can be used for
> processing multiple InputSplits. In you case if you have 5 files with 1
> record each it can call setup/cleanup 5 times. But if your records are in
> single file I think that setup/cleanup should be called once.
>
> --
> Thanks,
> Sergey
>
>
> On 06/05/14 02:49, jeremy p wrote:
>
> Let's say I have TaskTracker that receives 5 records to process for a
> single job.  When the TaskTracker processses the first record, it will
> instantiate my Mapper class and execute my setup() function.  It will then
> run the map() method on that record.  My question is this : what happens
> when the map() method has finished processing the first record?  I'm
> guessing it will do one of two things :
>
>  1) My cleanup() function will execute.  After the cleanup() method has
> finished, this instance of the Mapper object will be destroyed.  When it is
> time to process the next record, a new Mapper object will be instantiated.
>  Then my setup() method will execute, the map() method will execute, the
> cleanup() method will execute, and then the Mapper instance will be
> destroyed.  When it is time to process the next record, a new Mapper object
> will be instantiated.  This process will repeat itself until all 5 records
> have been processed.  In other words, my setup() and cleanup() methods will
> have been executed 5 times each.
>
>  or
>
>  2) When the map() method has finished processing my first record, the
> Mapper instance will NOT be destroyed.  It will be reused for all 5
> records.  When the map() method has finished processing the last record, my
> cleanup() method will execute.  In other words, my setup() and cleanup()
> methods will only execute 1 time each.
>
>  Thanks for the help!
>
>
>

Mime
View raw message