Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5210CB0E0 for ; Mon, 2 Jan 2012 10:18:16 +0000 (UTC) Received: (qmail 56259 invoked by uid 500); 2 Jan 2012 10:18:15 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 55555 invoked by uid 500); 2 Jan 2012 10:18:00 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 55547 invoked by uid 99); 2 Jan 2012 10:17:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jan 2012 10:17:56 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of egolan74@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jan 2012 10:17:48 +0000 Received: by pbaa12 with SMTP id a12so10511358pba.35 for ; Mon, 02 Jan 2012 02:17:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=u1Mr9+XzlVnzaPDWZyg53DxKrYd3jmln1U7p2K1eA3E=; b=m0dXxifhkbCTqAaWmEq8dTqSUGGh3VIdgf0AX53/7BNFmUq3v6UHIn7mENon8+U5JZ 0eHWX74l6N9pnypG9w74DmCrB4SqHYRpnPfB+t19qdS9ltBCW5ySEr0cKFZzucjxlIbc rZR6NgsYKwFa2RoSBNgVYluNzv3ZBLmyzNZ9o= Received: by 10.68.191.197 with SMTP id ha5mr119891894pbc.32.1325499447205; Mon, 02 Jan 2012 02:17:27 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.2.161 with HTTP; Mon, 2 Jan 2012 02:17:06 -0800 (PST) In-Reply-To: References: <2778E4CD-DBA8-4DCA-BC13-BE1F4B459E14@cloudera.com> From: Eyal Golan Date: Mon, 2 Jan 2012 12:17:06 +0200 Message-ID: Subject: Re: instantiation of classes in MR To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8ff1c40eb361a504b588e3f0 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c40eb361a504b588e3f0 Content-Type: text/plain; charset=ISO-8859-1 Thank you very much for the help. I am going to start working on it soon (a few days) and will probably have more questions :) Eyal Golan egolan74@gmail.com Visit: http://jvdrums.sourceforge.net/ LinkedIn: http://www.linkedin.com/in/egolan74 Skype: egolan74 P Save a tree. Please don't print this e-mail unless it's really necessary On Mon, Jan 2, 2012 at 2:01 AM, Anirudh wrote: > Any specific reason why setup is called for every task attempt. For > optimization point of view, wouldnt it be good if the setup is called only > once in case of JVM reuse. > I have not yet looked at the implementation, in case of JVM reuse is the > application Mapper instance reused or a new instance is created for every > task attempt? > > My suggestion for Eyal would be to have a static field initializer > expression in the Mapper to create the helper class instance. This will > ensure that the helper class will be instantiated when the Mapper class is > loaded. > > > > On Sun, Jan 1, 2012 at 7:05 AM, Harsh J wrote: > >> You are guaranteed one setup call for every single task attempt. This >> is regardless of JVM reuse being on or off. JVM reuse will cause no >> issues with what Eyal is attempting to do. >> >> On Sun, Jan 1, 2012 at 5:49 PM, Anirudh wrote: >> > No problems Eyal. >> > >> > On a second thought, for the JVM re-use the Mapper/Reducer instances >> should >> > be re-used, and the setup should be called only once. This makes sense >> too >> > as the JVM reuse is for the same job. >> > You should be good with class instantiation even if the JVM reuse is >> > enabled. >> > >> > >> > On Sat, Dec 31, 2011 at 11:39 PM, Eyal Golan >> wrote: >> >> >> >> Thank you very much for the detailed explanation Anirudh. >> >> >> >> I think that my question about node / VM was due to some lack of >> knowledge >> >> (I'm just starting to learn the Hadoop environment). >> >> Regarding configuration of the nodes and clusters. >> >> This is something that I am not doing by myself. We have a dedicated >> team >> >> for managing the Hadoop cluster and I'll ask them. >> >> >> >> I think that my question should have been: How many instances of the >> >> 'helper' class will be created in a single VM. >> >> And, as I understand, consider I am creating the helper in the setup / >> >> configure method, there would be one. >> >> And as long as it's stateless, I'm good. >> >> >> >> Thanks again, >> >> >> >> Eyal >> >> >> >> >> >> >> >> Eyal Golan >> >> egolan74@gmail.com >> >> >> >> Visit: http://jvdrums.sourceforge.net/ >> >> LinkedIn: http://www.linkedin.com/in/egolan74 >> >> Skype: egolan74 >> >> >> >> P Save a tree. Please don't print this e-mail unless it's really >> >> necessary >> >> >> >> >> >> >> >> On Sat, Dec 31, 2011 at 1:36 PM, Anirudh >> wrote: >> >>> >> >>> I just wanted to confirm where exactly you were planning to have the >> >>> instantiation code, as it was not mentioned in your previous post. The >> >>> location would have made difference. As you are doing it in the setup >> of >> >>> mapper/reducer, you are good. >> >>> >> >>> I was referring to the Task JVM Reuse option: >> >>> >> >>> >> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+JVM+Reuse >> >>> >> >>> It states that if the option to reuse JVM is enabled, the same Task >> JVM >> >>> will execute multiple tasks(i.e. map/reduce). I am not sure how this >> is >> >>> implemented, whether a new Mapper/Reducer is created for each task or >> they >> >>> too are re-reused. >> >>> If a new instance is created each time, then the mapper/reducer and >> all >> >>> its reference will be marked for garbage collection and you would be >> good. >> >>> If the Mapper/Reducer instances are re-used then the setup should be >> >>> called again creating another instance of your helper class. >> >>> >> >>> In my opinion the latter does not make sense, and the implementation >> >>> would be according to the prior approach i.e. creation of a new >> >>> Mapper/Reducer for each Task. But it would be interesting to check. >> >>> >> >>> As the classes in question are helper classes(stateless) you may not >> get >> >>> affected in terms of functionality. >> >>> >> >>> I am not clear on one of your statement: >> >>> >> >>> How many map tasks will be created? One per split or one per VM >> (node)? >> >>> Are you suggesting that although there would be one Mapper in the >> node... >> >>> >> >>> Have you configured your node to have a single slot for map/reduce >> task? >> >>> If yes then there will be one Mapper/Reducer task in the node. If no >> there >> >>> could be more than one mapper/reducer in the node depending on lots >> of other >> >>> paramerters i.e. no of mappers/reducers slots allocated on the node, >> no. of >> >>> input splits etc. If the node is configured to run more than one >> >>> Mapper/Reducer task the scheduler may choose to run more than one >> task on >> >>> the same node. The default is 2 Map & 2 Reduce tasks per node. And >> for each >> >>> task a new JVM is launched unless the JVM reuse option is enabled. >> >>> >> >>> Thanks, >> >>> Anirudh >> >>> >> >>> >> >>> On Sat, Dec 31, 2011 at 1:28 AM, Eyal Golan >> wrote: >> >>>> >> >>>> My idea is to create that class in the setup / configure method >> (depends >> >>>> which Mapper / Reducer I will inherit from). >> >>>> >> >>>> I don't understand the 'reuse' option you are referring to. >> >>>> How many map tasks will be created? One per split or one per VM >> (node)? >> >>>> Are you suggesting that although there would be one Mapper in the >> node, >> >>>> each new operator (or reflecting) will create a new instance? >> >>>> Thus making lots of that instance? >> >>>> >> >>>> BTW, >> >>>> these helper class I want to create are of course not going to be >> >>>> stateful. They are defiantly 'helper' class that have some logic. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Eyal >> >>>> >> >>>> Eyal Golan >> >>>> egolan74@gmail.com >> >>>> >> >>>> Visit: http://jvdrums.sourceforge.net/ >> >>>> LinkedIn: http://www.linkedin.com/in/egolan74 >> >>>> Skype: egolan74 >> >>>> >> >>>> P Save a tree. Please don't print this e-mail unless it's really >> >>>> necessary >> >>>> >> >>>> >> >>>> >> >>>> On Sat, Dec 31, 2011 at 6:50 AM, Anirudh >> >>>> wrote: >> >>>>> >> >>>>> Where are you creating this new class. If it is in the map function, >> >>>>> then it will be create a new object for each record in the split. >> >>>>> >> >>>>> Also you may need to see how the JVM reuse option works. I am not >> too >> >>>>> sure of this and you may want to look at the code. If the option >> for JVM >> >>>>> reuse is set, then my understanding is for every task, a new Map >> task would >> >>>>> be created and in that case the "new" operator will create another >> instance >> >>>>> even if this statement is not in the map function. >> >>>>> >> >>>>> >> >>>>> On Fri, Dec 30, 2011 at 6:22 AM, Eyal Golan >> wrote: >> >>>>>> >> >>>>>> Great News !! >> >>>>>> Thanks for the info. >> >>>>>> >> >>>>>> So using reflection, I can inject different implementations of >> >>>>>> interfaces (services) for the mapper (or reducer). >> >>>>>> And this way I can test a mapper (or reducer). >> >>>>>> Just by reflecting a stub instead of a real implementation. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Eyal Golan >> >>>>>> egolan74@gmail.com >> >>>>>> >> >>>>>> Visit: http://jvdrums.sourceforge.net/ >> >>>>>> LinkedIn: http://www.linkedin.com/in/egolan74 >> >>>>>> Skype: egolan74 >> >>>>>> >> >>>>>> P Save a tree. Please don't print this e-mail unless it's really >> >>>>>> necessary >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Dec 30, 2011 at 2:50 PM, Harsh J >> wrote: >> >>>>>>> >> >>>>>>> Eyal, >> >>>>>>> >> >>>>>>> Yes, it is right to think of each Task attempt being one >> individual >> >>>>>>> JVM running individually on any added Node. Multiple slots would >> mean >> >>>>>>> multiple VMs in parallel as well. Yes, your use of reflection to >> build your >> >>>>>>> objects will work just fine -- its all user-side java code that >> is executed. >> >>>>>>> >> >>>>>>> On 30-Dec-2011, at 4:42 PM, Eyal Golan wrote: >> >>>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I want to understand a basic concept in MR. >> >>>>>>> >> >>>>>>> If a mapper creates an instance of some class (using the 'new' >> >>>>>>> operator), then the created class exists ONCE in the VM of this >> node. >> >>>>>>> For each node. >> >>>>>>> Correct? >> >>>>>>> >> >>>>>>> Now, >> >>>>>>> what if instead of using the 'new' operator, the class is created >> >>>>>>> using reflection. >> >>>>>>> Is it valid in a MR? >> >>>>>>> Will only one instance of the created class be existing in that >> node? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> >> >>>>>>> Eyal >> >>>>>>> >> >>>>>>> Eyal Golan >> >>>>>>> egolan74@gmail.com >> >>>>>>> >> >>>>>>> Visit: http://jvdrums.sourceforge.net/ >> >>>>>>> LinkedIn: http://www.linkedin.com/in/egolan74 >> >>>>>>> Skype: egolan74 >> >>>>>>> >> >>>>>>> P Save a tree. Please don't print this e-mail unless it's really >> >>>>>>> necessary >> >>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> >> >> >> -- >> Harsh J >> > > --e89a8ff1c40eb361a504b588e3f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thank you very much for the help.

I am = going to start working on it soon (a few days) and will probably have more = questions :)


Ey= al Golan
egolan74@gmail.com<= /a>

Visit:
http://jvdrums.sourceforge.net/
LinkedIn: http://www.linkedin.com/in/ego= lan74
Skype: egolan74

P=A0 Save a tree. Please don't print this e-mail= unless it's really necessary



On Mon, Jan 2, 2012 at 2:01 AM, Anirudh = <techie.an= irudh@gmail.com> wrote:
Any specific reason why setup is called for every task attempt. For optimiz= ation point of view, wouldnt it be good if the setup is called only once in= case of JVM reuse.
I have not yet looked at the implementation, in case= of JVM reuse is the application Mapper instance reused or a new instance i= s created for every task attempt?

My suggestion for Eyal would be to have a static field initializer expr= ession in the Mapper to create the helper class instance. This will ensure = that the helper class will be instantiated when the Mapper class is loaded.=



On Sun, Jan 1, 2012 at 7:05 AM, Harsh J = <harsh@cloudera.com> wrote:
You are guaranteed one setup call for every single task attempt. This
is regardless of JVM reuse being on or off. JVM reuse will cause no
issues with what Eyal is attempting to do.

On Sun, Jan 1, 2012 at 5:49 PM, Anirudh <techie.anirudh@gmail.com> wrote:
> No problems Eyal.
>
> On=A0 a second thought, for the JVM re-use the Mapper/Reducer instance= s should
> be re-used, and the setup should be called only once. This makes sense= too
> as the JVM reuse is for the same job.
> You should be good with class instantiation even if the JVM reuse is > enabled.
>
>
> On Sat, Dec 31, 2011 at 11:39 PM, Eyal Golan <egolan74@gmail.com> wrote:
>>
>> Thank you very much for the detailed=A0explanation=A0Anirudh.
>>
>> I think that my question about node / VM was due to some lack of k= nowledge
>> (I'm just starting to learn the Hadoop environment).
>> Regarding configuration of the nodes and clusters.
>> This is something that I am not doing by myself. We have a dedicat= ed team
>> for managing the Hadoop cluster and I'll ask them.
>>
>> I think that my question should have been: How many instances of t= he
>> 'helper' class will be created in a single VM.
>> And, as I understand, consider I am creating the helper in the set= up /
>> configure method, there would be one.
>> And as long as it's stateless, I'm good.
>>
>> Thanks again,
>>
>> Eyal
>>
>>
>>
>> Eyal Golan
>> egolan74@g= mail.com
>>
>> Visit: http://jvdrums.sourceforge.net/
>> LinkedIn: http://www.linkedin.com/in/egolan74
>> Skype: egolan74
>>
>> P=A0 Save a tree. Please don't print this e-mail unless it'= ;s really
>> necessary
>>
>>
>>
>> On Sat, Dec 31, 2011 at 1:36 PM, Anirudh <techie.anirudh@gmail.com> w= rote:
>>>
>>> I just wanted to confirm where exactly you were planning to ha= ve the
>>> instantiation code, as it was not mentioned in your previous p= ost. The
>>> location would have made difference. As you are doing it in th= e setup of
>>> mapper/reducer, you are good.
>>>
>>> I was referring to the Task JVM Reuse option:
>>>
>>> http://hadoop.apache.org/c= ommon/docs/current/mapred_tutorial.html#Task+JVM+Reuse
>>>
>>> It states that if the option to reuse JVM is enabled, the same= Task JVM
>>> will execute multiple tasks(i.e. map/reduce). I am not sure ho= w this is
>>> implemented, whether a new Mapper/Reducer is created for each = task or they
>>> too are re-reused.
>>> If a new instance is created each time, then the mapper/reduce= r and=A0 all
>>> its reference will be marked for garbage collection and you wo= uld be good.
>>> If the Mapper/Reducer instances are re-used then the setup sho= uld be
>>> called again creating another instance of your helper class. >>>
>>> In my opinion the latter does not make sense, and the implemen= tation
>>> would be according to the prior approach i.e. creation of a ne= w
>>> Mapper/Reducer for each Task. But it would be interesting to c= heck.
>>>
>>> As the classes in question are helper classes(stateless) you m= ay not get
>>> affected in terms of functionality.
>>>
>>> I am not clear on one of your statement:
>>>
>>> How many map tasks will be created? One per split or one per V= M (node)?
>>> Are you suggesting that although there would be one Mapper in = the node...
>>>
>>> Have you configured your node to have a single slot for map/re= duce task?
>>> If yes then there will be one Mapper/Reducer task in the node.= If no there
>>> could be more than one mapper/reducer in the node depending on= lots of other
>>> paramerters i.e. no of mappers/reducers slots allocated on the= node, no. of
>>> input splits etc. If the node is configured to run more than o= ne
>>> Mapper/Reducer task the scheduler may choose to run more than = one task on
>>> the same node. The default is 2 Map & 2 Reduce tasks per n= ode. And for each
>>> task a new JVM is launched unless the JVM reuse option is enab= led.
>>>
>>> Thanks,
>>> Anirudh
>>>
>>>
>>> On Sat, Dec 31, 2011 at 1:28 AM, Eyal Golan <egolan74@gmail.com> wrote:=
>>>>
>>>> My idea is to create that class in the setup / configure m= ethod (depends
>>>> which Mapper / Reducer I will inherit from).
>>>>
>>>> I don't understand the 'reuse' option you are= =A0referring=A0to.
>>>> How many map tasks will be created? One per split or one p= er VM (node)?
>>>> Are you suggesting that although there would be one Mapper= in the node,
>>>> each new operator (or reflecting) will create a new instan= ce?
>>>> Thus making lots of that instance?
>>>>
>>>> BTW,
>>>> these helper class I want to create are of course not goin= g to be
>>>> stateful. They are=A0defiantly=A0'helper' class th= at have some logic.
>>>>
>>>> Thanks,
>>>>
>>>> Eyal
>>>>
>>>> Eyal Golan
>>>> eg= olan74@gmail.com
>>>>
>>>> Visit: http://jvdrums.sourceforge.net/
>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>>>> Skype: egolan74
>>>>
>>>> P=A0 Save a tree. Please don't print this e-mail unles= s it's really
>>>> necessary
>>>>
>>>>
>>>>
>>>> On Sat, Dec 31, 2011 at 6:50 AM, Anirudh <techie.anirudh@gmail.com>
>>>> wrote:
>>>>>
>>>>> Where are you creating this new class. If it is in the= map function,
>>>>> then it will be create a new object for each record in= the split.
>>>>>
>>>>> Also you may need to see how the JVM reuse option work= s. I am not too
>>>>> sure of this and you may want to look at the code. If = the option for JVM
>>>>> reuse is set, then my understanding is for every task,= a new Map task would
>>>>> be created and in that case the "new" operat= or will create another instance
>>>>> even if this statement is not in the map function.
>>>>>
>>>>>
>>>>> On Fri, Dec 30, 2011 at 6:22 AM, Eyal Golan <
egolan74@gmail.com>= ; wrote:
>>>>>>
>>>>>> Great News !!
>>>>>> Thanks for the info.
>>>>>>
>>>>>> So using reflection, I can inject different implem= entations of
>>>>>> interfaces (services) for the mapper (or reducer).=
>>>>>> And this way I can test a mapper (or reducer).
>>>>>> Just by reflecting a stub instead of a real implem= entation.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Eyal Golan
>>>>>> egolan74@gmail.com
>>>>>>
>>>>>> Visit: http://jvdrums.sourceforge.net/
>>>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>>>>>> Skype: egolan74
>>>>>>
>>>>>> P=A0 Save a tree. Please don't print this e-ma= il unless it's really
>>>>>> necessary
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 30, 2011 at 2:50 PM, Harsh J <harsh@cloudera.com&g= t; wrote:
>>>>>>>
>>>>>>> Eyal,
>>>>>>>
>>>>>>> Yes, it is right to think of each Task attempt= being one individual
>>>>>>> JVM running individually on any added Node. Mu= ltiple slots would mean
>>>>>>> multiple VMs in parallel as well. Yes, your us= e of reflection to build your
>>>>>>> objects will work just fine -- its all user-si= de java code that is executed.
>>>>>>>
>>>>>>> On 30-Dec-2011, at 4:42 PM, Eyal Golan wrote:<= br> >>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to understand a basic concept in MR. >>>>>>>
>>>>>>> If a mapper creates an instance of some class = (using the 'new'
>>>>>>> operator), then the created class exists ONCE = in the VM of this node.
>>>>>>> For each node.
>>>>>>> Correct?
>>>>>>>
>>>>>>> Now,
>>>>>>> what if instead of using the 'new' ope= rator, the class is created
>>>>>>> using reflection.
>>>>>>> Is it valid in a MR?
>>>>>>> Will only one instance of the created class be= existing in that node?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Eyal
>>>>>>>
>>>>>>> Eyal Golan
>>>>>>> egolan74@gmail.com
>>>>>>>
>>>>>>> Visit: http://jvdrums.sourceforge.net/
>>>>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>>>>>>> Skype: egolan74
>>>>>>>
>>>>>>> P=A0 Save a tree. Please don't print this = e-mail unless it's really
>>>>>>> necessary
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



--
Harsh J


--e89a8ff1c40eb361a504b588e3f0--