hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model
Date Thu, 10 Dec 2009 05:07:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788508#action_12788508
] 

Vinod K V commented on MAPREDUCE-1018:
--------------------------------------

Looked at the latest patch. I've some more comments. Some of the following may not have been
introduced by your patch though.
h4.cluster_setup.html
 - Statement I: _Users can, optionally, specify the MEM task-limit per job. If no such limit
is provided, a default limit is used. A node-limit can be set per node._
We don't have default limits anymore. So the statement _"If no such limit is provided, a default
limit is used."_ can be removed.
 - Node-limit cannot be set directly anymore. So, we should define the node limit here by
saying _"Node-limit of total memory usage for tasks is given by Node-limit = mapreduce.tasktracker.map.tasks.maximum
* mapreduce.cluster.mapmemory.mb +  mapreduce.tasktracker.reduce.tasks.maximum * mapreduce.cluster.mapmemory.mb"_.
 - _To enable monitoring for a TT, the following parameters all need to be set:_
This is not true w.r.t job configuration. So this table should only have TT configuration.
And move job configuration to another table. We should move the above statement to later and
thus the whole section would look like
{quote}
To enable monitoring for a TT, the following parameters all need to be set:
TABLE:I with TT parameters
Node-limit of total memory usage for tasks is given by Node-limit = mapreduce.tasktracker.map.tasks.maximum
* mapreduce.cluster.mapmemory.mb +  mapreduce.tasktracker.reduce.tasks.maximum * mapreduce.cluster.mapmemory.mb"
Users can, optionally, specify the MEM task-limit per job.
TABLE II with Job parameters
{quote}
 - _2. Periodically, the TT checks the following:_
Should be "If memory monitoring is enabled, the TT does the following periodically:"

h4. mapred_tutorial.html:
 - Memory management section defines the job parameters again here. But "mapreduce.map.memory.mb"
is repeated twice, one of them should be "mapreduce.reduce.memory.mb"
 - _Users can choose to override default limits of memory enforced by the task tracker, if
memory management is enabled. Users can set the following parameter per job:_
Please modify "memory management" to be "memory monitoring" and link it to the monitoring
section in cluster_setup.html

h4. capacity_scheduler.html:
 - Please rename the section name to "Memory-based task-scheduling"
 - _The Capacity Scheduler supports scheduling of tasks on a TaskTracker(TT) based on a job's
memory requirements and the availability of RAM and Virtual Memory (VMEM) on the TT node._
As my previous review comments mentioned, support for RAM availability is no longer there.
So it should read _"....and the availability of Virtual Memory (VMEM) on the TT node. There
isn't any need for distinguishing VMEM from physical memory w.r.t tasks. Depending on a site's
requirements, the configuration can be set depending on whether one wants tasks to go beyond
physical memory or not."_
 - _"See the MapReduce Tutorial for details on how the TT monitors memory usage."_
This should actually point to cluster_setup.html. Previous review comment missed.
 - _"Currently the memory based scheduling is only supported in Linux platform."_
   This isn't quite right. It should be _"Memory based scheduling primarily exists to avoid
memory pressure by tasks on a TT and thus is dependent on TT-memory monitoring which currently
is only supported in Linux platform."_
 - _"1. The absence of any one or more of four config parameters or -1 being set as value
of any of the parameters, mapreduce.cluster.mapmemory.mb, mapreduce.cluster.reducememory.mb,
or mapreduce.jobtracker.maxmapmemory.mb, mapreduce.jobtracker.maxreducememory.mb  disables
memory-based scheduling, just as it disables memory monitoring for a TT. These config parameters
are described in the MapReduce Tutorial. "_
    This can be greatly simplified to _"The configuration properties mapreduce.cluster.mapmemory.mb,
mapreduce.cluster.reducememory.mb, or mapreduce.jobtracker.maxmapmemory.mb, mapreduce.jobtracker.maxreducememory.mb
are used to enable/disable memory based scheduling. The absence of being set as -1 of any
one of these properties disables memory-based scheduling, just as it disables monitoring for
a TT. These parameters are described in the Cluster-setup <a href="cluster_setup.html#memory_monitoring">memory-monitoring
section</a>."_
 - The second statement that describes scheduling can be greatly simplified by writing it
as a list of points, like my previous review described. Also, we haven't introduces reservations
anywhere else, so that part also needs to be explained so and clear. Roughly,
{quote}
    * Point 2 in Working of scheduling
      Total = numSlots * PerSlotMemSize.
      Used = Sigma(numSlotsPerTask * PerSlotMemSize)
      if (can fit), schedule. Otherwise reserve.  Reserve why? what? How many?
{quote}
That was just a rough cut, but should give you an idea

> Document changes to the memory management and scheduling model
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-1018
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.21.0
>            Reporter: Hemanth Yamijala
>            Assignee: rahul k singh
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, MAPRED-1018-3.patch, MAPRED-1018-commons.patch
>
>
> There were changes done for the configuration, monitoring and scheduling of high ram
jobs. This must be documented in the mapred-defaults.xml and also on forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message