hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Golan <egola...@gmail.com>
Subject Re: instantiation of classes in MR
Date Mon, 02 Jan 2012 10:17:06 GMT
Thank you very much for the help.

I am going to start working on it soon (a few days) and will probably have
more questions :)


Eyal Golan
egolan74@gmail.com

Visit: http://jvdrums.sourceforge.net/
LinkedIn: http://www.linkedin.com/in/egolan74
Skype: egolan74

P  Save a tree. Please don't print this e-mail unless it's really necessary



On Mon, Jan 2, 2012 at 2:01 AM, Anirudh <techie.anirudh@gmail.com> wrote:

> Any specific reason why setup is called for every task attempt. For
> optimization point of view, wouldnt it be good if the setup is called only
> once in case of JVM reuse.
> I have not yet looked at the implementation, in case of JVM reuse is the
> application Mapper instance reused or a new instance is created for every
> task attempt?
>
> My suggestion for Eyal would be to have a static field initializer
> expression in the Mapper to create the helper class instance. This will
> ensure that the helper class will be instantiated when the Mapper class is
> loaded.
>
>
>
> On Sun, Jan 1, 2012 at 7:05 AM, Harsh J <harsh@cloudera.com> wrote:
>
>> You are guaranteed one setup call for every single task attempt. This
>> is regardless of JVM reuse being on or off. JVM reuse will cause no
>> issues with what Eyal is attempting to do.
>>
>> On Sun, Jan 1, 2012 at 5:49 PM, Anirudh <techie.anirudh@gmail.com> wrote:
>> > No problems Eyal.
>> >
>> > On  a second thought, for the JVM re-use the Mapper/Reducer instances
>> should
>> > be re-used, and the setup should be called only once. This makes sense
>> too
>> > as the JVM reuse is for the same job.
>> > You should be good with class instantiation even if the JVM reuse is
>> > enabled.
>> >
>> >
>> > On Sat, Dec 31, 2011 at 11:39 PM, Eyal Golan <egolan74@gmail.com>
>> wrote:
>> >>
>> >> Thank you very much for the detailed explanation Anirudh.
>> >>
>> >> I think that my question about node / VM was due to some lack of
>> knowledge
>> >> (I'm just starting to learn the Hadoop environment).
>> >> Regarding configuration of the nodes and clusters.
>> >> This is something that I am not doing by myself. We have a dedicated
>> team
>> >> for managing the Hadoop cluster and I'll ask them.
>> >>
>> >> I think that my question should have been: How many instances of the
>> >> 'helper' class will be created in a single VM.
>> >> And, as I understand, consider I am creating the helper in the setup /
>> >> configure method, there would be one.
>> >> And as long as it's stateless, I'm good.
>> >>
>> >> Thanks again,
>> >>
>> >> Eyal
>> >>
>> >>
>> >>
>> >> Eyal Golan
>> >> egolan74@gmail.com
>> >>
>> >> Visit: http://jvdrums.sourceforge.net/
>> >> LinkedIn: http://www.linkedin.com/in/egolan74
>> >> Skype: egolan74
>> >>
>> >> P  Save a tree. Please don't print this e-mail unless it's really
>> >> necessary
>> >>
>> >>
>> >>
>> >> On Sat, Dec 31, 2011 at 1:36 PM, Anirudh <techie.anirudh@gmail.com>
>> wrote:
>> >>>
>> >>> I just wanted to confirm where exactly you were planning to have the
>> >>> instantiation code, as it was not mentioned in your previous post. The
>> >>> location would have made difference. As you are doing it in the setup
>> of
>> >>> mapper/reducer, you are good.
>> >>>
>> >>> I was referring to the Task JVM Reuse option:
>> >>>
>> >>>
>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+JVM+Reuse
>> >>>
>> >>> It states that if the option to reuse JVM is enabled, the same Task
>> JVM
>> >>> will execute multiple tasks(i.e. map/reduce). I am not sure how this
>> is
>> >>> implemented, whether a new Mapper/Reducer is created for each task or
>> they
>> >>> too are re-reused.
>> >>> If a new instance is created each time, then the mapper/reducer and
>> all
>> >>> its reference will be marked for garbage collection and you would be
>> good.
>> >>> If the Mapper/Reducer instances are re-used then the setup should be
>> >>> called again creating another instance of your helper class.
>> >>>
>> >>> In my opinion the latter does not make sense, and the implementation
>> >>> would be according to the prior approach i.e. creation of a new
>> >>> Mapper/Reducer for each Task. But it would be interesting to check.
>> >>>
>> >>> As the classes in question are helper classes(stateless) you may not
>> get
>> >>> affected in terms of functionality.
>> >>>
>> >>> I am not clear on one of your statement:
>> >>>
>> >>> How many map tasks will be created? One per split or one per VM
>> (node)?
>> >>> Are you suggesting that although there would be one Mapper in the
>> node...
>> >>>
>> >>> Have you configured your node to have a single slot for map/reduce
>> task?
>> >>> If yes then there will be one Mapper/Reducer task in the node. If no
>> there
>> >>> could be more than one mapper/reducer in the node depending on lots
>> of other
>> >>> paramerters i.e. no of mappers/reducers slots allocated on the node,
>> no. of
>> >>> input splits etc. If the node is configured to run more than one
>> >>> Mapper/Reducer task the scheduler may choose to run more than one
>> task on
>> >>> the same node. The default is 2 Map & 2 Reduce tasks per node. And
>> for each
>> >>> task a new JVM is launched unless the JVM reuse option is enabled.
>> >>>
>> >>> Thanks,
>> >>> Anirudh
>> >>>
>> >>>
>> >>> On Sat, Dec 31, 2011 at 1:28 AM, Eyal Golan <egolan74@gmail.com>
>> wrote:
>> >>>>
>> >>>> My idea is to create that class in the setup / configure method
>> (depends
>> >>>> which Mapper / Reducer I will inherit from).
>> >>>>
>> >>>> I don't understand the 'reuse' option you are referring to.
>> >>>> How many map tasks will be created? One per split or one per VM
>> (node)?
>> >>>> Are you suggesting that although there would be one Mapper in the
>> node,
>> >>>> each new operator (or reflecting) will create a new instance?
>> >>>> Thus making lots of that instance?
>> >>>>
>> >>>> BTW,
>> >>>> these helper class I want to create are of course not going to be
>> >>>> stateful. They are defiantly 'helper' class that have some logic.
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Eyal
>> >>>>
>> >>>> Eyal Golan
>> >>>> egolan74@gmail.com
>> >>>>
>> >>>> Visit: http://jvdrums.sourceforge.net/
>> >>>> LinkedIn: http://www.linkedin.com/in/egolan74
>> >>>> Skype: egolan74
>> >>>>
>> >>>> P  Save a tree. Please don't print this e-mail unless it's really
>> >>>> necessary
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Sat, Dec 31, 2011 at 6:50 AM, Anirudh <techie.anirudh@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Where are you creating this new class. If it is in the map function,
>> >>>>> then it will be create a new object for each record in the split.
>> >>>>>
>> >>>>> Also you may need to see how the JVM reuse option works. I am
not
>> too
>> >>>>> sure of this and you may want to look at the code. If the option
>> for JVM
>> >>>>> reuse is set, then my understanding is for every task, a new
Map
>> task would
>> >>>>> be created and in that case the "new" operator will create another
>> instance
>> >>>>> even if this statement is not in the map function.
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Dec 30, 2011 at 6:22 AM, Eyal Golan <egolan74@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> Great News !!
>> >>>>>> Thanks for the info.
>> >>>>>>
>> >>>>>> So using reflection, I can inject different implementations
of
>> >>>>>> interfaces (services) for the mapper (or reducer).
>> >>>>>> And this way I can test a mapper (or reducer).
>> >>>>>> Just by reflecting a stub instead of a real implementation.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Eyal Golan
>> >>>>>> egolan74@gmail.com
>> >>>>>>
>> >>>>>> Visit: http://jvdrums.sourceforge.net/
>> >>>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>> >>>>>> Skype: egolan74
>> >>>>>>
>> >>>>>> P  Save a tree. Please don't print this e-mail unless it's
really
>> >>>>>> necessary
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Dec 30, 2011 at 2:50 PM, Harsh J <harsh@cloudera.com>
>> wrote:
>> >>>>>>>
>> >>>>>>> Eyal,
>> >>>>>>>
>> >>>>>>> Yes, it is right to think of each Task attempt being
one
>> individual
>> >>>>>>> JVM running individually on any added Node. Multiple
slots would
>> mean
>> >>>>>>> multiple VMs in parallel as well. Yes, your use of reflection
to
>> build your
>> >>>>>>> objects will work just fine -- its all user-side java
code that
>> is executed.
>> >>>>>>>
>> >>>>>>> On 30-Dec-2011, at 4:42 PM, Eyal Golan wrote:
>> >>>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I want to understand a basic concept in MR.
>> >>>>>>>
>> >>>>>>> If a mapper creates an instance of some class (using
the 'new'
>> >>>>>>> operator), then the created class exists ONCE in the
VM of this
>> node.
>> >>>>>>> For each node.
>> >>>>>>> Correct?
>> >>>>>>>
>> >>>>>>> Now,
>> >>>>>>> what if instead of using the 'new' operator, the class
is created
>> >>>>>>> using reflection.
>> >>>>>>> Is it valid in a MR?
>> >>>>>>> Will only one instance of the created class be existing
in that
>> node?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Eyal
>> >>>>>>>
>> >>>>>>> Eyal Golan
>> >>>>>>> egolan74@gmail.com
>> >>>>>>>
>> >>>>>>> Visit: http://jvdrums.sourceforge.net/
>> >>>>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>> >>>>>>> Skype: egolan74
>> >>>>>>>
>> >>>>>>> P  Save a tree. Please don't print this e-mail unless
it's really
>> >>>>>>> necessary
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Mime
View raw message