hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: config for high memory jobs does not work, please help.
Date Fri, 18 Jan 2013 21:18:19 GMT
Take a look at the CapacityScheduler and 'High RAM' jobs where-by you can run M map slots per
node and request, per-job, that you want N (where N = max(1, N, M)).

Some more info:
http://hadoop.apache.org/docs/stable/capacity_scheduler.html#Resource+based+scheduling
http://hortonworks.com/blog/understanding-apache-hadoops-capacity-scheduler/

hth,
Arun

On Jan 18, 2013, at 12:05 PM, Shaojun Zhao wrote:

> Dear all,
> 
> I know it is best to use small amount of mem in mapper and reduce.
> However, sometimes it is hard to do so. For example, in machine
> learning algorithms, it is common to load the model into mem in the
> mapper step. When the model is big, I have to allocate a lot of mem
> for the mapper.
> 
> Here is my question: how can I config hadoop so that it does not fork
> too many mappers and run out of physical memory?
> 
> My machines have 24G, and I have 100 of them. Each time, hadoop will
> fork 6 mappers on each machine, no matter what config I used. I really
> want to reduce it to what ever number I want, for example, just 1
> mapper per machine.
> 
> Here are the config I tried. (I use streaming, and I pass the config
> in the command line)
> 
> -Dmapred.child.java.opts=-Xmx8000m  <-- did not bring down the number of mappers
> 
> -Dmapred.cluster.map.memory.mb=32000 <-- did not bring down the number
> of mappers
> 
> Am I missing something here?
> I use Hadoop 0.20.205
> 
> Thanks a lot in advance!
> -Shaojun

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message