Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of unmeshabiju@gmail.com
 designates 209.85.220.181 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADcmSao0SUnKT1uONLvtPr0_HFDrefVViqk6_tEcd=DCcu4c7g@mail.gmail.com>
References: 
 <CAKjW-3LNu1mOOkkKVxNq4CkE3FVP2=uA7UWJTre9sZv8UV_Z+Q@mail.gmail.com>
 <5368786D.5010808@gmail.com>
 <CADcmSao0SUnKT1uONLvtPr0_HFDrefVViqk6_tEcd=DCcu4c7g@mail.gmail.com>
From: unmesha sreeveni <unmeshabiju@gmail.com>
Date: Wed, 7 May 2014 10:06:09 +0530
Message-ID: 
 <CACp0qUE3S_OQqDCS_LD2oHnSVRHfS2m5PLZeCLfRthz+LDuA-g@mail.gmail.com>
Subject: Re: Are mapper classes re-instantiated for each record?
To: User Hadoop <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7b6dd084b8ff8504f8c7e9df

--047d7b6dd084b8ff8504f8c7e9df
Content-Type: text/plain; charset=UTF-8

Setup() Method is called before all the mappers and cleanup() method is
called after all mappers


On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <rajkrrsingh@gmail.com> wrote:

> point 2 is right,The framework first calls setup() followed by map() for
> each key/value pair in the InputSplit. Finally cleanup() is called
> irrespective of no of records in the input split.
>
> ::::::::::::::::::::::::::::::::::::::::
> Raj K Singh
> http://in.linkedin.com/in/rajkrrsingh
> http://www.rajkrrsingh.blogspot.com
> Mobile  Tel: +91 (0)9899821370
>
>
> On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev <sergeymurylev@gmail.com>wrote:
>
>>  Hi Jeremy,
>>
>> According to official documentation<http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html>setup and cleanup calls performed for each InputSplit. In this case you
>> variant 2 is more correct. But actually single mapper can be used for
>> processing multiple InputSplits. In you case if you have 5 files with 1
>> record each it can call setup/cleanup 5 times. But if your records are in
>> single file I think that setup/cleanup should be called once.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>> On 06/05/14 02:49, jeremy p wrote:
>>
>> Let's say I have TaskTracker that receives 5 records to process for a
>> single job.  When the TaskTracker processses the first record, it will
>> instantiate my Mapper class and execute my setup() function.  It will then
>> run the map() method on that record.  My question is this : what happens
>> when the map() method has finished processing the first record?  I'm
>> guessing it will do one of two things :
>>
>>  1) My cleanup() function will execute.  After the cleanup() method has
>> finished, this instance of the Mapper object will be destroyed.  When it is
>> time to process the next record, a new Mapper object will be instantiated.
>>  Then my setup() method will execute, the map() method will execute, the
>> cleanup() method will execute, and then the Mapper instance will be
>> destroyed.  When it is time to process the next record, a new Mapper object
>> will be instantiated.  This process will repeat itself until all 5 records
>> have been processed.  In other words, my setup() and cleanup() methods will
>> have been executed 5 times each.
>>
>>  or
>>
>>  2) When the map() method has finished processing my first record, the
>> Mapper instance will NOT be destroyed.  It will be reused for all 5
>> records.  When the map() method has finished processing the last record, my
>> cleanup() method will execute.  In other words, my setup() and cleanup()
>> methods will only execute 1 time each.
>>
>>  Thanks for the help!
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

--047d7b6dd084b8ff8504f8c7e9df
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:verdana,=
sans-serif">Setup() Method is called before all the mappers and cleanup() m=
ethod is called after all mappers</div></div><div class=3D"gmail_extra"><br=
><br>

<div class=3D"gmail_quote">On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:rajkrrsingh@gmail.com" target=3D"_blank=
">rajkrrsingh@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex">

<div dir=3D"ltr"><div class=3D"gmail_default"><font face=3D"trebuchet ms, s=
ans-serif">point 2 is right,The framework first calls setup() followed by m=
ap() for each key/value pair in the InputSplit. Finally cleanup() is called=
 irrespective of no of records in the input split.</font><br>


</div></div><div class=3D"gmail_extra"><br clear=3D"all"><div><div dir=3D"l=
tr"><font face=3D"trebuchet ms, sans-serif">:::::::::::::::::::::::::::::::=
:::::::::<br>Raj K Singh</font><div><font face=3D"trebuchet ms, sans-serif"=
><a href=3D"http://in.linkedin.com/in/rajkrrsingh" target=3D"_blank">http:/=
/in.linkedin.com/in/rajkrrsingh</a><br>


</font><div><font face=3D"trebuchet ms, sans-serif"><a href=3D"http://www.r=
ajkrrsingh.blogspot.com" target=3D"_blank">http://www.rajkrrsingh.blogspot.=
com</a><br>Mobile=C2=A0 Tel: +91 (0)9899821370</font><br></div></div></div>=
</div>

<div><div class=3D"h5">

<br><br><div class=3D"gmail_quote">On Tue, May 6, 2014 at 11:21 AM, Sergey =
Murylev <span dir=3D"ltr">&lt;<a href=3D"mailto:sergeymurylev@gmail.com" ta=
rget=3D"_blank">sergeymurylev@gmail.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">


 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    Hi Jeremy,<br>
    <br>
    According to <a href=3D"http://hadoop.apache.org/docs/r2.2.0/api/org/ap=
ache/hadoop/mapreduce/Mapper.html" target=3D"_blank">official
      documentation</a> setup and cleanup calls performed for each
    InputSplit. In this case you variant 2 is more correct. But actually
    single mapper can be used for processing multiple InputSplits. In
    you case if you have 5 files with 1 record each it can call
    setup/cleanup 5 times. But if your records are in single file I
    think that setup/cleanup should be called once.<br>
    <br>
    <font color=3D"#c0c0c0">--</font><font color=3D"#c0c0c0"><br>
      Thanks,<br>
      Sergey</font><div><div><br>
    <br>
    <div>On 06/05/14 02:49, jeremy p wrote:<br>
    </div>
    <blockquote type=3D"cite">
      <div dir=3D"ltr">Let&#39;s say I have TaskTracker that receives 5
        records to process for a single job. =C2=A0When the TaskTracker
        processses the first record, it will instantiate my Mapper class
        and execute my setup() function. =C2=A0It will then run the map()
        method on that record. =C2=A0My question is this : what happens whe=
n
        the map() method has finished processing the first record? =C2=A0I&=
#39;m
        guessing it will do one of two things :
        <div>
          <br>
        </div>
        <div>1) My cleanup() function will execute. =C2=A0After the cleanup=
()
          method has finished, this instance of the Mapper object will
          be destroyed. =C2=A0When it is time to process the next record, a
          new Mapper object will be instantiated. =C2=A0Then my setup()
          method will execute, the map() method will execute, the
          cleanup() method will execute, and then the Mapper instance
          will be destroyed. =C2=A0When it is time to process the next
          record, a new Mapper object will be instantiated. =C2=A0This
          process will repeat itself until all 5 records have been
          processed. =C2=A0In other words, my setup() and cleanup() methods
          will have been executed 5 times each.</div>
        <div><br>
        </div>
        <div>or</div>
        <div><br>
        </div>
        <div>2) When the map() method has finished processing my first
          record, the Mapper instance will NOT be destroyed. =C2=A0It will =
be
          reused for all 5 records. =C2=A0When the map() method has finishe=
d
          processing the last record, my cleanup() method will execute.
          =C2=A0In other words, my setup() and cleanup() methods will only
          execute 1 time each.</div>
        <div><br>
        </div>
        <div>Thanks for the help!</div>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div dir=3D"=
ltr"><b><font color=3D"#3d85c6"><i>Thanks &amp; Regards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div>

<b><font color=3D"#3d85c6">Center for Cyber Security | Amrita Vishwa Vidyap=
eetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=3D"htt=
p://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.unmeshas=
reeveni.blogspot.in/</a><br>

</div><div style=3D"color:rgb(102,0,0)"><br></div><i><span><br></span></i><=
/div>
</div>

--047d7b6dd084b8ff8504f8c7e9df--