hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7154) Should set MALLOC_ARENA_MAX in hadoop-config.sh
Date Tue, 24 Jul 2012 18:49:37 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421636#comment-13421636

Andy Isaacson commented on HADOOP-7154:

I was very confused by this discussion and dug into it a bit more; here's what I learned.
 The takeaway is, ARENA_MAX=4 is a win for Java apps.
# Java doesn't use {{malloc()}} for object allocations; instead it uses its own directly {{mmap()}}ed
# however, a few things such as direct {{ByteBuffer}}s do end up calling malloc on arbitrary
threads.  There's not much thread locality in the use of such buffers.

As a result, the glibc arena allocator is using a lot of VSS to optimize a codepath that's
not very hot.  So decreasing the number of arenas is a win, overall, even though it will increase
contention (the malloc arena locks are pretty cold so this doesn't matter much) and potentially
increase cache churn.  But fewer arenas should decrease total cache footprint by increasing

> Should set MALLOC_ARENA_MAX in hadoop-config.sh
> -----------------------------------------------
>                 Key: HADOOP-7154
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7154
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>             Fix For: 1.1.0, 0.22.0
>         Attachments: hadoop-7154.txt
> New versions of glibc present in RHEL6 include a new arena allocator design. In several
clusters we've seen this new allocator cause huge amounts of virtual memory to be used, since
when multiple threads perform allocations, they each get their own memory arena. On a 64-bit
system, these arenas are 64M mappings, and the maximum number of arenas is 8 times the number
of cores. We've observed a DN process using 14GB of vmem for only 300M of resident set. This
causes all kinds of nasty issues for obvious reasons.
> Setting MALLOC_ARENA_MAX to a low number will restrict the number of memory arenas and
bound the virtual memory, with no noticeable downside in performance - we've been recommending
MALLOC_ARENA_MAX=4. We should set this in hadoop-env.sh to avoid this issue as RHEL6 becomes
more and more common.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message