Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54EC71177E for ; Mon, 12 May 2014 15:16:01 +0000 (UTC) Received: (qmail 1997 invoked by uid 500); 10 May 2014 23:19:25 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 46666 invoked by uid 500); 10 May 2014 23:07:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 35532 invoked by uid 99); 10 May 2014 22:58:39 -0000 Received: from Unknown (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 May 2014 22:58:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of unmeshabiju@gmail.com designates 209.85.220.181 as permitted sender) Received: from [209.85.220.181] (HELO mail-vc0-f181.google.com) (209.85.220.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 May 2014 04:37:10 +0000 Received: by mail-vc0-f181.google.com with SMTP id ld13so9071vcb.12 for ; Tue, 06 May 2014 21:36:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=XiKCiUWTOpukNxH7CI6rOhtafejs07PjB6I8qLR0Lbs=; b=mQdL5a8UPeJaPJ9DaSYHWmIlHZUm9iiIl9B+G5cd2/w9+0JNgsqz70sA0y30Sl47zS CuqdQLySy62k/mcEiNse14NvCDWevl/FP8cuQHLNoCYEJXEdh9loUFC4EHkWwoh9is3N 48Yufs5Y8K7RgtqBVzQ5M6rCpcnlcURY0TKiKWOztURpkZofrvJTZ9YRLWINKRr/1xqT OYcJ8yE8NHfh6XRiWE+ost+SY+95Nat0D7OziTTVzx3tIPuUC9LCQKme0CsLHUeiwU4C ScfoK8sN7Q+EXWPFF+j4qr/GmC0HCtsiVr95PCBJUBbuRQfADI8scxTvzhgJ+HzpuAsB 660g== X-Received: by 10.58.29.16 with SMTP id f16mr11670447veh.23.1399437410237; Tue, 06 May 2014 21:36:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.154.166 with HTTP; Tue, 6 May 2014 21:36:09 -0700 (PDT) In-Reply-To: References: <5368786D.5010808@gmail.com> From: unmesha sreeveni Date: Wed, 7 May 2014 10:06:09 +0530 Message-ID: Subject: Re: Are mapper classes re-instantiated for each record? To: User Hadoop Content-Type: multipart/alternative; boundary=047d7b6dd084b8ff8504f8c7e9df X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6dd084b8ff8504f8c7e9df Content-Type: text/plain; charset=UTF-8 Setup() Method is called before all the mappers and cleanup() method is called after all mappers On Tue, May 6, 2014 at 1:17 PM, Raj K Singh wrote: > point 2 is right,The framework first calls setup() followed by map() for > each key/value pair in the InputSplit. Finally cleanup() is called > irrespective of no of records in the input split. > > :::::::::::::::::::::::::::::::::::::::: > Raj K Singh > http://in.linkedin.com/in/rajkrrsingh > http://www.rajkrrsingh.blogspot.com > Mobile Tel: +91 (0)9899821370 > > > On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev wrote: > >> Hi Jeremy, >> >> According to official documentationsetup and cleanup calls performed for each InputSplit. In this case you >> variant 2 is more correct. But actually single mapper can be used for >> processing multiple InputSplits. In you case if you have 5 files with 1 >> record each it can call setup/cleanup 5 times. But if your records are in >> single file I think that setup/cleanup should be called once. >> >> -- >> Thanks, >> Sergey >> >> >> On 06/05/14 02:49, jeremy p wrote: >> >> Let's say I have TaskTracker that receives 5 records to process for a >> single job. When the TaskTracker processses the first record, it will >> instantiate my Mapper class and execute my setup() function. It will then >> run the map() method on that record. My question is this : what happens >> when the map() method has finished processing the first record? I'm >> guessing it will do one of two things : >> >> 1) My cleanup() function will execute. After the cleanup() method has >> finished, this instance of the Mapper object will be destroyed. When it is >> time to process the next record, a new Mapper object will be instantiated. >> Then my setup() method will execute, the map() method will execute, the >> cleanup() method will execute, and then the Mapper instance will be >> destroyed. When it is time to process the next record, a new Mapper object >> will be instantiated. This process will repeat itself until all 5 records >> have been processed. In other words, my setup() and cleanup() methods will >> have been executed 5 times each. >> >> or >> >> 2) When the map() method has finished processing my first record, the >> Mapper instance will NOT be destroyed. It will be reused for all 5 >> records. When the map() method has finished processing the last record, my >> cleanup() method will execute. In other words, my setup() and cleanup() >> methods will only execute 1 time each. >> >> Thanks for the help! >> >> >> > -- *Thanks & Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Center for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/ --047d7b6dd084b8ff8504f8c7e9df Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Setup() Method is called before all the mappers and cleanup() m= ethod is called after all mappers

On Tue, May 6, 2014 at 1:17 PM, Raj K Singh <rajkrrsingh@gmail.com> wrote:
point 2 is right,The framework first calls setup() followed by m= ap() for each key/value pair in the InputSplit. Finally cleanup() is called= irrespective of no of records in the input split.

:::::::::::::::::::::::::::::::= :::::::::
Raj K Singh
=


On Tue, May 6, 2014 at 11:21 AM, Sergey = Murylev <sergeymurylev@gmail.com> wrote:
=20 =20 =20
Hi Jeremy,

According to official documentation setup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey


On 06/05/14 02:49, jeremy p wrote:
Let's say I have TaskTracker that receives 5 records to process for a single job. =C2=A0When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. =C2=A0It will then run the map() method on that record. =C2=A0My question is this : what happens whe= n the map() method has finished processing the first record? =C2=A0I&= #39;m guessing it will do one of two things :

1) My cleanup() function will execute. =C2=A0After the cleanup= () method has finished, this instance of the Mapper object will be destroyed. =C2=A0When it is time to process the next record, a new Mapper object will be instantiated. =C2=A0Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. =C2=A0When it is time to process the next record, a new Mapper object will be instantiated. =C2=A0This process will repeat itself until all 5 records have been processed. =C2=A0In other words, my setup() and cleanup() methods will have been executed 5 times each.

or

2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. =C2=A0It will = be reused for all 5 records. =C2=A0When the map() method has finishe= d processing the last record, my cleanup() method will execute. =C2=A0In other words, my setup() and cleanup() methods will only execute 1 time each.

Thanks for the help!





--
Thanks & Regards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
=
Center for Cyber Security | Amrita Vishwa Vidyap= eetham


<= /div>
--047d7b6dd084b8ff8504f8c7e9df--