hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Idle nodes with terasort and MRv2/YARN (0.23.1)
Date Tue, 05 Jun 2012 12:35:27 GMT
Trevor,

On May 30, 2012, at 1:38 PM, Trevor Robinson wrote:

> Jeff:
> 
> Thanks for the corroboration and advice. I can't retreat to 0.20, and
> must forge ahead with 2.0, so I'll share any progress.
> 
> Arun:
> 
> I haven't set the minimum container size. Do you know the default? Is
> there a way to easily find defaults (more complete/reliable than
> docs)?

The default is way too low (128M) which is non-optimal.

I've opened https://issues.apache.org/jira/browse/MAPREDUCE-4316 to fix it.

> 
> Thanks, I'll give that a try. How does minimum container size relate
> to settings mapreduce.map.memory.mb? Would it essentially raise my
> 768M map memory allocation to 1G? Does CapacityScheduler need any
> additional configuration to function optimally? BTW, what is the
> default scheduler?

The min. container size should be >= mapreduce.map.memory.mb.

If you have a heap size of 768M for your maps, it should be perfectly ok to have min-container
size as 1024M.

Hmm... I'd also look into bumping mapreduce.reduce.memory.mb to be 2*min-container-size i.e.
use 2 containers.

The default scheduler is FifoScheduler which isn't as mature as CS (which is what we use for
tuning etc., see http://hortonworks.com/blog/delivering-on-hadoop-next-benchmarking-performance/
for more details).

> 
> I should have MAPREDUCE-3641. I'm using 0.23.1 with CDH4b2 patches
> (and a few Java 7/Ubuntu 12.04 build patches). How does 2.0.0-alpha
> compare to 0.23.1?
> 

I'm not familiar with patchsets on CDH, but hadoop-2.0.0-alpha is a significant improvement
on hadoop-0.23.1...

> If there's anything I can do to assist with the issue of spreading out
> map tasks, please let me know. Is there a JIRA issue for it (or if
> not, should there be)?
> 

For smaller clusters https://issues.apache.org/jira/browse/MAPREDUCE-3210 could help.

> Incidentally, my current benchmarking work on x86 is only a training
> ground and baseline before moving onto ARM-based systems, which have
> 4GB RAM and generally fewer, smaller (2.5" form factor) disks per
> node. It sounds like the smaller RAM will force better distribution,
> but the disk capacity/utilization situation will be more severe.
> 

Right, smaller RAM should force better distribution.

Love to get more f/b from your ARM work, thanks!

Arun

> Thanks,
> Trevor
> 
> On Tue, May 29, 2012 at 6:21 PM, Arun C Murthy <acm@hortonworks.com> wrote:
>> What is the minimum container size? i.e.
>> yarn.scheduler.minimum-allocation-mb.
>> 
>> I'd bump it up to at least 1G and use the CapacityScheduler for performance
>> tests:
>> http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>> 
>> In case of teragen, the job has no locality at all (since it's just
>> generating data from 'random' input-splits) and hence you are getting them
>> stuck on fewer nodes since you have so many containers on each node.
>> 
>> The reduces should be better spread if you are using CapacityScheduler and
>> have https://issues.apache.org/jira/browse/MAPREDUCE-3641 in your build i.e.
>> hadoop-0.23.1 or hadoop-2.0.0-alpha (I'd use the latter).
>> 
>> Also, FYI, currently the CS makes the tradeoff that node-locality is almost
>> same as rack-locality and hence you might see maps not spread out for
>> terasort. I'll fix that one soon.
>> 
>> hth,
>> Arun
>> 
>> On May 29, 2012, at 2:33 PM, Trevor Robinson wrote:
>> 
>> Hello,
>> 
>> I'm trying to tune terasort on a small cluster (4 identical slave
>> nodes w/ 4 disks and 16GB RAM each), but I'm having problems with very
>> uneven load.
>> 
>> For teragen, I specify 24 mappers, but for some reason, only 2 nodes
>> out of 4 run them all, even though the web UI (for both YARN and HDFS)
>> shows all 4 nodes available. Similarly, I specify 16 reducers for
>> terasort, but the reducers seem to run on 3 nodes out of 4. Do I have
>> something configured wrong, or does the scheduler not attempt to
>> spread out the load? In addition to performing sub-optimally, this
>> also causes me to run out of disk space for large jobs, since the data
>> is not being spread out evenly.
>> 
>> Currently, I'm using these settings (not shown as XML for brevity):
>> 
>> yarn-site.xml:
>> yarn.nodemanager.resource.memory-mb=13824
>> 
>> mapred-site.xml:
>> mapreduce.map.memory.mb=768
>> mapreduce.map.java.opts=-Xmx512M
>> mapreduce.reduce.memory.mb=2304
>> mapreduce.reduce.java.opts=-Xmx2048M
>> mapreduce.task.io.sort.mb=512
>> 
>> In case it's significant, I've scripted the cluster setup and terasort
>> jobs, so everything runs back-to-back instantly, except that I poll to
>> ensure that HDFS is up and has active data nodes before running
>> teragen. I've also tried adding delays, but they didn't seem to have
>> any effect, so I don't *think* it's a start-up race issue.
>> 
>> Thanks for any advice,
>> Trevor
>> 
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message