hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Task JVM Reuse for MapReduce Jobs in 0.20.2
Date Fri, 29 Jul 2011 19:03:12 GMT
Brandon,

New JVMs for each slot will be spawned across different jobs. For
tasks of the same job, this shouldn't happen. Are you seeing this
happen for tasks of the same job itself?

Also, since your question may be specific to CDH use, I've moved the
discussion to cdh-user@cloudera.org (mapreduce-user@ bcc'd)

On Fri, Jul 29, 2011 at 11:50 PM, Brandon Vargo <brandon@fullcontact.com> wrote:
> Hello,
>
> I am trying to setup a MapReduce job so that the task JVMs are reused on
> each cluster node. Libraries used by my MapReduce job have a significant
> initialization time, mainly creating singletons, and it would be nice if
> I could make it so that these singletons are only created once per slot,
> rather than once per task. The input for the job is HBase, so for a
> large row scan, the initialization time is proving to be quite
> significant, as the processing done on each row is rather small and the
> number of tasks is high.
>
> I am setting mapred.job.reuse.jvm.num.tasks to -1 in the job
> configuration, as stated in the documentation ([1]), yet I am still
> seeing a different JVM start for each task. This is visible both by
> watching the processes executing on each node using ps, as well as
> watching the debugging logs from the job. Otherwise, the job is working
> as expected.
>
> I have tried switching to the deprecated JobConf class and using
> setNumTasksToExecutePerJvm, but to no avail. I also tried setting
> mapreduce.job.jvm.numtasks, the equivalent setting in Hadoop 0.21, in
> case the documentation was out of date, though this did not help either.
>
> I have confirmed that mapred.job.reuse.jvm.num.tasks is being
> transferred to the copy of the job configuration on the task tracker, by
> looking at the task tracker's copy of job.xml ([2]).
>
> I am running Cloudera's cdh3u0 (Hadoop 0.20.2, full version string:
> 0.20.2-cdh3u0, r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14) and HBase
> 0.90.1.
>
> Thank you in advance if anyone may be able to shed light on this issue.
>
> [1] -
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task
> +JVM+Reuse
>
> [2] - The following appears in the property file (split across multiple
> lines by me for readability):
> <property>
>  <!--Loaded from /mnt/mapred/jt/jobTracker/job_201107281409_0028.xml-->
>  <name>mapred.job.reuse.jvm.num.tasks</name>
>  <value>-1</value>
> </property>
>
> Regards,
>
> Brandon Vargo
>
>



-- 
Harsh J

Mime
View raw message