hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb
Date Fri, 21 Nov 2014 08:02:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220641#comment-14220641
] 

Sandy Ryza commented on MAPREDUCE-5785:
---------------------------------------

Hi Karthik.  Took a look at the patch.  Had some comments - mostly stylistic.

{code}
-    return adminClasspath + " " + userClasspath;
+      return jobConf.getTaskJavaOpts(isMapTask ? TaskType.MAP : TaskType.REDUCE);
{code}
Wrong indentation?

{code}
+  public String getTaskJavaOpts(TaskType taskType) {
+
+    String javaOpts = getConfiguredTaskJavaOpts(taskType);
{code}
Unnecessary blank line.

{code}
+      LOG.info("Task java.opts does not specify max heap size, setting using"
+          + " mapreduce.*.memory.mb * "
+          + MRJobConfig.HEAP_MEMORY_MB_RATIO);
{code}
Can we condense this and the log further down into a single message?

{code}
+        if (LOG.isWarnEnabled()) {
{code}
Why use isWarnEnabled when we don't use isInfoEnabled?

{code}
+      final int taskContainerMb = getMemoryRequired(taskType);
{code}
Any reason this should be final? Convention is usually not to declare local variables final
unless they need to be (like referenced by an anonymous class).

{code}
+      int taskHeapSize =(int)Math.ceil(taskContainerMb * heapRatio);
{code}
Should have a space after the "=".

{code}
+  public static int parseMaximumHeapSizeMB(String javaOpts) {
{code}
Can this (and others) be marked Private?

{code}
+    int memory = 1024;
{code}
It looks like this value will be overwritten in all cases.

{code}
+          (heapSize = parseMaximumHeapSizeMB(
+              getConfiguredTaskJavaOpts(taskType))) > 0) {
{code}
This is a little weird.  Can we assign heapSize outside of the condition?

{code}
+        memory =
+            getInt(MRJobConfig.REDUCE_MEMORY_MB,
+                MRJobConfig.DEFAULT_REDUCE_MEMORY_MB);
{code}
This can be on 2 lines.

{code}
+  If -Xmx is not set, it is inferred from mapreduc.{map|reduce}.memory.mb and
{code}
Missing an "e" at the end of mapreduc.

{code}
+  <description>The ratio container between heap-size and container-size
+    If no -Xmx specified, it's calculated from the container memory
+    requirement: mapreduce.*.memory.mb * mapreduce.heap.memory-mb.ratio.
+    If -Xmx is specified but not mapreduce.*.memory.mb, it's calculated as
+    heapSize / mapreduce.heap.memory-mb.ratio.
{code}
Need a period after container size.  "*" meaning both multiplication and either map or reduce
is a little confusing here. It might be better to spell out {map|reduce} inside the config
properties, which would also be consistent with how they're references above.  Also, other
descriptions tend to use "it is" instead of "it's".

{code}
   <description>The amount of memory to request from the scheduler for each
-  reduce task.
+    reduce task. If this is not specified, it is inferred from
{code}
Indentation here is inconsistent with other places.

Any reason to have getTaskJavaOpts in JobConf instead of MapReduceChildJVM?


> Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5785
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mr-am, task
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch,
mr-5785-4.patch
>
>
> Currently users have to set 2 memory-related configs per Job / per task type.  One first
chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java
heap size Xmx < map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native
memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx,
MR-AM might be 
> - allocating big containers whereas the JVM will only use the default -Xmx200m.
> - allocating small containers that will OOM because Xmx is too high.
> With this JIRA, we propose to set Xmx automatically based on an empirical ratio that
can be adjusted. Xmx is not changed automatically if provided by the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message