hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model
Date Mon, 16 Nov 2009 07:00:40 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778231#action_12778231

Vinod K V commented on MAPREDUCE-1018:

Already so many other mapreduce issues have only modified cluster-setup.xml, the one in mapreduce
project. Rahul mentioned offline that forrest documentation is not getting generated in mapreduce
sub-project. Assuming we'll address that in a separate issue, I propose we have only one patch
- the mapred one.

 - The mapred Patch has git prefixes which need to be removed

 - Monitoring/Scheduling based on RAM is completely removed. So remove the references too.
Just add a note saying that (quoting from HADOOP-5881) there isn't any need for distinguishing
vmem from physical memory w.r.t configuration. Depending on a site's requirements, the configuration
items can reflect whether one wants tasks to go beyond physical memory or not.

 - All config names should be renamed to the new names. Of-course this means a slightly different
patch for 0.20 - which we will come to after the patch
 for trunk's done
 - mapred.{map|reduce}.child.ulimit also need to be renamed
 - What happens when monitoring is enabled, but job has -1?
 - Memory-monitoring is no longer defined in terms of per-task-limit and per-node-limit. It
is now driven by per-slot-size and number of slots. We should use these new terms through-out.
 - "Before getting into details, consider the following additional memory-related parameters
than can be configured to enable better scheduling:"\
 The above line is no longer needed.
 - Feature for monitoring RAM no more. Remove all references.
 - Working of scheduling
   -- Point 1: 4 parameters, not three. Parameters described in cluster_setup.  vmem.reserved
no more used.
   -- Point 2: This is changed completely. No more offsets.
   Total = numSlots * PerSlotMemSize.
   Used = Sigma(numSlotsPerTask * PerSlotMemSize)
   -- Point 3: JT now rejects the jobs, not the scheduler.
 - "See the MapReduce Tutorial for details on how the TT monitors memory usage."
 "See cluster_setup" instead?

 - Need to update mapred_tutorial.html's memory management section. Aslo need a reference
to this in both cluster_setup.html as well as capacity_scheduler.html

 - Another point I've already mentioned on the JIRA.
 "Along with everything else, we should document that job setup and job cleanup tasks of all
jobs, either requiring or not requiring high memory for their maps and reduces, still run
on a single slot.

> Document changes to the memory management and scheduling model
> --------------------------------------------------------------
>                 Key: MAPREDUCE-1018
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 0.21.0
>            Reporter: Hemanth Yamijala
>            Priority: Blocker
>             Fix For: 0.21.0
>         Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, MAPRED-1018-commons.patch
> There were changes done for the configuration, monitoring and scheduling of high ram
jobs. This must be documented in the mapred-defaults.xml and also on forrest documentation

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message