hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3921) MR AM should act on the nodes liveliness information when nodes go up/down/unhealthy
Date Fri, 06 Apr 2012 15:13:24 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248411#comment-13248411

Robert Joseph Evans commented on MAPREDUCE-3921:

A few minor comments about the patch, and some questions on the manual testing that was done
on it.  Overall the patch looks very good and once the javac warnings are addressed and I
know manual testing was performed I am a +1 on it

  # Have you tested this with AM Recovery?  Specifically I would like to see the AM recover
when a map task finished successfully and then was killed because the node went bad.
  # Have you tested this with reduces?  The code will reschedule the map task, but I don't
really see where/if it informs the reducer that it is rescheduling the map task until that
new task finishes successfully.  I believe that the reducer would just ignore an update for
a task it has already fetched successfully, but I just want to be sure it was tested.
  # NodeState.isUnhealthy()  (Very minor) I think it would be cleaner, to have it be {code}
return this == UNHEALTHY ||
       this == DECOMMISSIONED ||
       this == LOST;
  # KilledAfterSuccessTransition.transition() There is some commented out code {code}
// why set a wrong finish time ???
//set the finish time
{code} Is this needed? If not please remove it.
  # KilledAfterSuccessTransition.transition() I am a bit confused by the log statement {code}
if (taskAttempt.getLaunchTime() != 0) {
}else {
  LOG.debug("Not generating HistoryFinish event since start event not generated for taskAttempt:
      + taskAttempt.getID());
{code} Is this really needed (looks like it was a copy and paste from the KilledTransition)?
 When would we even get a successful job that did not have a launch time?  I would rather
have it be an ERROR or WARN rather then a debug if we did see this in this transition.
  # TaskAttemptCompletedEventTransition.transition() {code}
// TODO assert nodeId is not null
{code} please either add in the assert or remove the TODO.

> MR AM should act on the nodes liveliness information when nodes go up/down/unhealthy
> ------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3921
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3921
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Bikas Saha
>             Fix For: 0.23.2
>         Attachments: MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921-branch-0.23.patch,
MAPREDUCE-3921-branch-0.23.patch, MAPREDUCE-3921.patch

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message