hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb
Date Fri, 21 Nov 2014 08:02:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220641#comment-14220641

Sandy Ryza commented on MAPREDUCE-5785:

Hi Karthik.  Took a look at the patch.  Had some comments - mostly stylistic.

-    return adminClasspath + " " + userClasspath;
+      return jobConf.getTaskJavaOpts(isMapTask ? TaskType.MAP : TaskType.REDUCE);
Wrong indentation?

+  public String getTaskJavaOpts(TaskType taskType) {
+    String javaOpts = getConfiguredTaskJavaOpts(taskType);
Unnecessary blank line.

+      LOG.info("Task java.opts does not specify max heap size, setting using"
+          + " mapreduce.*.memory.mb * "
+          + MRJobConfig.HEAP_MEMORY_MB_RATIO);
Can we condense this and the log further down into a single message?

+        if (LOG.isWarnEnabled()) {
Why use isWarnEnabled when we don't use isInfoEnabled?

+      final int taskContainerMb = getMemoryRequired(taskType);
Any reason this should be final? Convention is usually not to declare local variables final
unless they need to be (like referenced by an anonymous class).

+      int taskHeapSize =(int)Math.ceil(taskContainerMb * heapRatio);
Should have a space after the "=".

+  public static int parseMaximumHeapSizeMB(String javaOpts) {
Can this (and others) be marked Private?

+    int memory = 1024;
It looks like this value will be overwritten in all cases.

+          (heapSize = parseMaximumHeapSizeMB(
+              getConfiguredTaskJavaOpts(taskType))) > 0) {
This is a little weird.  Can we assign heapSize outside of the condition?

+        memory =
+            getInt(MRJobConfig.REDUCE_MEMORY_MB,
+                MRJobConfig.DEFAULT_REDUCE_MEMORY_MB);
This can be on 2 lines.

+  If -Xmx is not set, it is inferred from mapreduc.{map|reduce}.memory.mb and
Missing an "e" at the end of mapreduc.

+  <description>The ratio container between heap-size and container-size
+    If no -Xmx specified, it's calculated from the container memory
+    requirement: mapreduce.*.memory.mb * mapreduce.heap.memory-mb.ratio.
+    If -Xmx is specified but not mapreduce.*.memory.mb, it's calculated as
+    heapSize / mapreduce.heap.memory-mb.ratio.
Need a period after container size.  "*" meaning both multiplication and either map or reduce
is a little confusing here. It might be better to spell out {map|reduce} inside the config
properties, which would also be consistent with how they're references above.  Also, other
descriptions tend to use "it is" instead of "it's".

   <description>The amount of memory to request from the scheduler for each
-  reduce task.
+    reduce task. If this is not specified, it is inferred from
Indentation here is inconsistent with other places.

Any reason to have getTaskJavaOpts in JobConf instead of MapReduceChildJVM?

> Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb
> ---------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-5785
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mr-am, task
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch,
> Currently users have to set 2 memory-related configs per Job / per task type.  One first
chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java
heap size Xmx < map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native
memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx,
MR-AM might be 
> - allocating big containers whereas the JVM will only use the default -Xmx200m.
> - allocating small containers that will OOM because Xmx is too high.
> With this JIRA, we propose to set Xmx automatically based on an empirical ratio that
can be adjusted. Xmx is not changed automatically if provided by the user.

This message was sent by Atlassian JIRA

View raw message