hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Golan <egola...@gmail.com>
Subject Re: instantiation of classes in MR
Date Sun, 01 Jan 2012 07:39:31 GMT
Thank you very much for the detailed explanation Anirudh.

I think that my question about node / VM was due to some lack of knowledge
(I'm just starting to learn the Hadoop environment).
Regarding configuration of the nodes and clusters.
This is something that I am not doing by myself. We have a dedicated team
for managing the Hadoop cluster and I'll ask them.

I think that my question should have been: How many instances of the
'helper' class will be created in a single VM.
And, as I understand, consider I am creating the helper in the setup /
configure method, there would be one.
And as long as it's stateless, I'm good.

Thanks again,

Eyal



Eyal Golan
egolan74@gmail.com

Visit: http://jvdrums.sourceforge.net/
LinkedIn: http://www.linkedin.com/in/egolan74
Skype: egolan74

P  Save a tree. Please don't print this e-mail unless it's really necessary



On Sat, Dec 31, 2011 at 1:36 PM, Anirudh <techie.anirudh@gmail.com> wrote:

> I just wanted to confirm where exactly you were planning to have the
> instantiation code, as it was not mentioned in your previous post. The
> location would have made difference. As you are doing it in the setup of
> mapper/reducer, you are good.
>
> I was referring to the Task JVM Reuse option:
>
> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+JVM+Reuse
>
> It states that if the option to reuse JVM is enabled, the same Task JVM
> will execute multiple tasks(i.e. map/reduce). I am not sure how this is
> implemented, whether a new Mapper/Reducer is created for each task or they
> too are re-reused.
> If a new instance is created each time, then the mapper/reducer and  all
> its reference will be marked for garbage collection and you would be good.
> If the Mapper/Reducer instances are re-used then the setup should be
> called again creating another instance of your helper class.
>
> In my opinion the latter does not make sense, and the implementation would
> be according to the prior approach i.e. creation of a new Mapper/Reducer
> for each Task. But it would be interesting to check.
>
> As the classes in question are helper classes(stateless) you may not get
> affected in terms of functionality.
>
> I am not clear on one of your statement:
>
> *How many map tasks will be created? One per split or one per VM (node)?*
> *Are you suggesting that although there would be one Mapper in the node*
> ...
>
> Have you configured your node to have a single slot for map/reduce task?
> If yes then there will be one Mapper/Reducer task in the node. If no there
> could be more than one mapper/reducer in the node depending on lots of
> other paramerters i.e. no of mappers/reducers slots allocated on the node,
> no. of input splits etc. If the node is configured to run more than one
> Mapper/Reducer task the scheduler may choose to run more than one task on
> the same node. The default is 2 Map & 2 Reduce tasks per node. And for each
> task a new JVM is launched unless the JVM reuse option is enabled.
>
> Thanks,
> Anirudh
>
>
> On Sat, Dec 31, 2011 at 1:28 AM, Eyal Golan <egolan74@gmail.com> wrote:
>
>> My idea is to create that class in the setup / configure method (depends
>> which Mapper / Reducer I will inherit from).
>>
>> I don't understand the 'reuse' option you are referring to.
>> How many map tasks will be created? One per split or one per VM (node)?
>> Are you suggesting that although there would be one Mapper in the node,
>> each new operator (or reflecting) will create a new instance?
>> Thus making lots of that instance?
>>
>> BTW,
>> these helper class I want to create are of course not going to be
>> stateful. They are defiantly 'helper' class that have some logic.
>>
>> Thanks,
>>
>> Eyal
>>
>> Eyal Golan
>> egolan74@gmail.com
>>
>> Visit: http://jvdrums.sourceforge.net/
>> LinkedIn: http://www.linkedin.com/in/egolan74
>> Skype: egolan74
>>
>> P  Save a tree. Please don't print this e-mail unless it's really
>> necessary
>>
>>
>>
>> On Sat, Dec 31, 2011 at 6:50 AM, Anirudh <techie.anirudh@gmail.com>wrote:
>>
>>> Where are you creating this new class. If it is in the map function,
>>> then it will be create a new object for each record in the split.
>>>
>>> Also you may need to see how the JVM reuse option works. I am not too
>>> sure of this and you may want to look at the code. If the option for JVM
>>> reuse is set, then my understanding is for every task, a new Map task would
>>> be created and in that case the "new" operator will create another instance
>>> even if this statement is not in the map function.
>>>
>>>
>>> On Fri, Dec 30, 2011 at 6:22 AM, Eyal Golan <egolan74@gmail.com> wrote:
>>>
>>>> Great News !!
>>>> Thanks for the info.
>>>>
>>>> So using reflection, I can inject different implementations of
>>>> interfaces (services) for the mapper (or reducer).
>>>> And this way I can test a mapper (or reducer).
>>>> Just by reflecting a stub instead of a real implementation.
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Eyal Golan
>>>> egolan74@gmail.com
>>>>
>>>> Visit: http://jvdrums.sourceforge.net/
>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>>>> Skype: egolan74
>>>>
>>>> P  Save a tree. Please don't print this e-mail unless it's really
>>>> necessary
>>>>
>>>>
>>>>
>>>> On Fri, Dec 30, 2011 at 2:50 PM, Harsh J <harsh@cloudera.com> wrote:
>>>>
>>>>> Eyal,
>>>>>
>>>>> Yes, it is right to think of each Task attempt being one individual
>>>>> JVM running individually on any added Node. Multiple slots would mean
>>>>> multiple VMs in parallel as well. Yes, your use of reflection to build
your
>>>>> objects will work just fine -- its all user-side java code that is executed.
>>>>>
>>>>> On 30-Dec-2011, at 4:42 PM, Eyal Golan wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I want to understand a basic concept in MR.
>>>>>
>>>>> If a mapper creates an instance of some class (using the 'new'
>>>>> operator), then the created class exists ONCE in the VM of this node.
>>>>> For each node.
>>>>> Correct?
>>>>>
>>>>> Now,
>>>>> what if instead of using the 'new' operator, the class is created
>>>>> using reflection.
>>>>> Is it valid in a MR?
>>>>> Will only one instance of the created class be existing in that node?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Eyal
>>>>>
>>>>> Eyal Golan
>>>>> egolan74@gmail.com
>>>>>
>>>>> Visit: http://jvdrums.sourceforge.net/
>>>>> LinkedIn: http://www.linkedin.com/in/egolan74
>>>>> Skype: egolan74
>>>>>
>>>>> P  Save a tree. Please don't print this e-mail unless it's really
>>>>> necessary
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message