hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gmail <juer.kai...@gmail.com>
Subject Yarn doesn't start mappers fast enough
Date Tue, 06 Oct 2015 10:13:39 GMT
Hallo everyone,

I have a problem with my yarn setup and hope you can help me. I already searched for this
issue but didn't find anything.

My problem is that yarn doesn’t start new mappers fast enough. This results in a poor cluster
utilization.

Setup:
 - 8 nodes @64cores+128GB
 - Hadoop version: Hadoop 2.6.0,
 - Standard Terasort of 100GB, input data generated by teragen with two mappers

What I see: At most ~40 mappers run at the same time. It looks like the rate of starting new
mappers and the finishing rate is about the same at that point. The avg. processing time of
each mapper is about 34-40s. If I start a second Terasort at the same time, it also  only
runs up to ~40 mappers. It seems that 1) yarn correctly detects that it can run more but 2)
doesn't start new mappers fast enough (1 at a time?).
What I expect: better utilization of all nodes since there are 300+ map jobs.

Are there parameters to change this behavior? How can I tell yarn to start more instances
at the same time?

for completeness:
    - the behavior doesn't change if I use more mappers during teragen.
    - the bahavior doesn't change if I modify the number of nodes.
    - I recompiled Hadoop for 64bit according to https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
<https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html>
    - I use an GPFS as backend with the IBM gpfs-connector.

Thanks in advance,
Jürgen

Mime
View raw message