hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto <ssan...@datameer.com>
Subject Re: JVM reuse in Map Tasks
Date Mon, 04 Jun 2012 12:18:44 GMT
Hi Arpit,

A point to mention from http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/:

If each task takes less than 30-40 seconds, reduce the number of tasks. The task setup and
scheduling overhead is a few seconds, so if tasks finish very quickly, you’re wasting time
while not doing work. JVM reuse can also be enabled to solve this problem.

Further I can think if we create a huge tree in the mapper phase in a Child JVM(lets say implementation
needs a huge tree to be created), same can be re-used across the JVMs rather than creating
again and again.

Subroto Sanyal

On Jun 4, 2012, at 2:12 PM, Arpit Wanchoo wrote:

> Hi
> I wanted to check what exactly we gain  when JVM reusability is enabled in mapped job.
> My doubt was regarding the setup() method of mapper. Is it called for a mapper even if
it is using the JVM for previously run mapper ?
> If yes then is there any way I can control it or stop from being called more than once.
> Regards,
> Arpit Wanchoo | Sr. Software Engineer
> Guavus Network Systems.
> 6th Floor, Enkay Towers, Tower B & B1,Vanijya Nikunj, Udyog Vihar Phase - V, Gurgaon,Haryana.
> Mobile Number +91-9899949788

View raw message