hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jeremy p <athomewithagroove...@gmail.com>
Subject Re: Are mapper classes re-instantiated for each record?
Date Wed, 07 May 2014 01:15:35 GMT
Thank you!  This has helped me immensely.


On Tue, May 6, 2014 at 12:47 AM, Raj K Singh <rajkrrsingh@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <sergeymurylev@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup
and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>

Mime
View raw message