hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Bhattacharjee <rahul.rec....@gmail.com>
Subject Re: VM reuse!
Date Tue, 16 Apr 2013 11:33:24 GMT
Ok, Thanks Bejoy.

Only in some typical scenarios it's possible , like the one that you have
Much more number of mappers and less number of mappers slots.


On Tue, Apr 16, 2013 at 2:40 PM, Bejoy Ks <bejoy.hadoop@gmail.com> wrote:

> Hi Rahul
> If you look at larger cluster and jobs that involve larger input data
> sets. The data would be spread across the whole cluster, and a single node
> might have  various blocks of that entire data set. Imagine you have a
> cluster with 100 map slots and your job has 500 map tasks, now in that case
> there should be multiple map tasks in a single task tracker based on slot
> availability.
> Here if you enable jvm reuse, all tasks related to a job on a single
> TaskTracker would use the same jvm. The benefit here is just the time you
> are saving in spawning and cleaning up jvm for individual tasks.
> On Tue, Apr 16, 2013 at 2:04 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>> Hi,
>> I have a question related to VM reuse in Hadoop.I now understand the
>> purpose of VM reuse , but I am wondering how is it useful.
>> Example. for VM reuse to be effective or kicked in , we need more than
>> one mapper task to be submitted to a single node (for the same job).Hadoop
>> would consider spawning mappers into nodes which actually contains the data
>> , it might rarely happen that multiple mappers are allocated to a single
>> task tracker. And even if a single task nodes gets to run multiple mappers
>> then it might as well run in parallel in multiple VM rather than
>> sequentially in a single VM.
>> I am sure I am missing some link here , please help me find that.
>> Thanks,
>> Rahul

View raw message