hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Memory intensive jobs and JVM reuse
Date Thu, 29 Apr 2010 15:52:39 GMT
* JVM reuse only applies within the same job. Different jobs are always
different JVMs
* JVM reuse is serial; you'll only get task B in a JVM after task A has
already completed -- never both at the same time. If you configure Hadoop to
run 4 tasks per TT concurrently, you'll still have 4 JVMs up.

There isn't an easy way to play tricks to get this large data structure of
yours to be shared between tasks or jobs, using Hadoop's own mechanisms.
 Instead, you need to look to an external system. e.g., populate memcached
with your hashmap ahead of time and have all your jobs query that. Or use a
BDB or TokyoCabinet or some other disk-backed key-value database and have
each task demand-load only the (k, v) pairs that are relevant to the
particular task.

- Aaron

On Thu, Apr 29, 2010 at 8:43 AM, David Rosenstrauch <darose@darose.net>wrote:

> On 04/29/2010 11:08 AM, Danny Leshem wrote:
>> David,
>> DistributedCache distributes files across the cluster - it is not a shared
>> memory cache.
>> My problem is not distributing the HashMap across machines, but the fact
>> that it is replicated in memory for each task (or each job, for that
>> matter).
> OK, sorry for the misunderstanding.
> Hmmm ... well ... I thought there was a config parm to control whether the
> task gets launched in a new VM, but I can't seem to find it.  A quick look
> at the list of map/reduce parms turned up this, though:
> mapred.job.reuse.jvm.num.tasks  default: 1      How many tasks to run per
> jvm. If set to -1, there is no limit.
> Perhaps that might help?  I'm speculating here, but in theory, if you set
> it to -1, then all task attempts per job per node would run in the same VM
> ... and so be able to have access to the same static variables.
> You might also want to poke around in the full list of map/reduce config
> parms and see if there's anything else in there that might help solve this:
> http://hadoop.apache.org/common/docs/current/mapred-default.html
> HTH,
> DR

View raw message