hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Lu <...@apache.org>
Subject Re: About reducer's Shuffle JVM Heap Size
Date Fri, 02 Nov 2012 22:45:32 GMT
The task code was optimized from 32-bit jvm (yes, people use 64-bit
jvm for servers and 32-bit jvm for tasks in production), because it's
more memory efficient with the same Xmx, which by default is smaller
than 2GiB. Feel free to open a jira for improvement. I'd preserve the
optimization for 32-bit jvm by checking the os.arch property.

On Sat, Oct 27, 2012 at 7:35 AM, Lijie Xu <csxulijie@gmail.com> wrote:
> Hi, all.
> I'm debugging Hadoop's source code and find an incomprehensible setting.
> In hadoop-0.20.2 and hadoop-1.0.3, reducer's shuffle buffer size cannot
> exceed 2048MB (i.e., Integer.MAX_VALUE).  In this way, although*//*
> reducer's JVM size can be set more than 2048MB (e.g.,
> mapred.child.java.opts=-Xmx4000m), the heap size used for shuffle buffer is
> at most "2048MB * maxInMemCopyUse (default 0.7)" not "4000MB *
> maxInMemCopyUse".
> I think it's not a reasonable setting for large memory machines.
>
> The following code taken from "org.apache.hadoop.mapred.ReduceTask" shows
> the concrete algorithm.
> I'm wondering why maxSize is declared as "long" but set as "(int)". I point
> out the corresponding code  with "-->"
>
> Thanks for any suggestion.
> ---------------------------------------------------------------------------------------------------------------------------
>       private final long maxSize;
>       private final long maxSingleShuffleLimit;
>
>       private long size = 0;
>
>       private Object dataAvailable = new Object();
>       private long fullSize = 0;
>       private int numPendingRequests = 0;
>       private int numRequiredMapOutputs = 0;
>       private int numClosed = 0;
>       private boolean closed = false;
>
>       public ShuffleRamManager(Configuration conf) throws IOException {
>         final float maxInMemCopyUse =
> conf.getFloat("mapred.job.shuffle.input.buffer.percent", 0.70f);
>         if (maxInMemCopyUse > 1.0 || maxInMemCopyUse < 0.0) {
>           throw new IOException("mapred.job.shuffle.input.buffer.percent" +
>                                 maxInMemCopyUse);
>         }
>         // Allow unit tests to fix Runtime memory
> -->   maxSize = (int)(conf.getInt("mapred.job.reduce.total.mem.bytes",
> -->        (int)Math.min(Runtime.getRuntime().maxMemory(),
> Integer.MAX_VALUE))
> -->      * maxInMemCopyUse);
>         maxSingleShuffleLimit = (long)(maxSize *
> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
>         LOG.info("ShuffleRamManager: MemoryLimit=" + maxSize +
>                  ", MaxSingleShuffleLimit=" + maxSingleShuffleLimit);
>       }

Mime
View raw message