hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuan Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4835) AM job metrics can double-count a job if it errors after entering a completion state
Date Tue, 04 Dec 2012 00:47:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509385#comment-13509385
] 

Xuan Gong commented on MAPREDUCE-4835:
--------------------------------------

The method "Have JobImpl.finished ignore incrementing any metrics if the job is already in
a terminal state (SUCCEEDED/FAILED/KILLED) to avoid double-counting a job." may not work.
But before we call the finished, the current states is already changed. So, it is very difficult
to check previous status is terminal states or not.
For example, somehow we did InternalErrorTransition, it will change to state from succeeded
to error. From the code at InternalErrorTransition, 
    public void transition(JobImpl job, JobEvent event) {
      //TODO Is this JH event required.
      job.setFinishTime();
      JobUnsuccessfulCompletionEvent failedEvent =
          new JobUnsuccessfulCompletionEvent(job.oldJobId,
              job.finishTime, 0, 0,
              JobStateInternal.ERROR.toString());
      job.eventHandler.handle(new JobHistoryEvent(job.jobId, failedEvent)); <-- this line
is actually change the states
      job.finished(JobStateInternal.ERROR); <-- this line will increase the failure count
that is duplicate
    }
So, what we can do is add JobStateInternal previousState = getInternalState() before job.eventHandler.handle(new
JobHistoryEvent(job.jobId, failedEvent)), and check the previousState to decide whether we
need to increase the count or not.
For example, if we do not want to increase the count when we change the terminal states to
error state. We can do:
In InternalErrorTransition, 
    public void transition(JobImpl job, JobEvent event) {
      //TODO Is this JH event required.
      job.setFinishTime();
      JobUnsuccessfulCompletionEvent failedEvent =
          new JobUnsuccessfulCompletionEvent(job.oldJobId,
              job.finishTime, 0, 0,
              JobStateInternal.ERROR.toString());
      JobStateInternal previousState = job.getInternalState();
      job.eventHandler.handle(new JobHistoryEvent(job.jobId, failedEvent));
      //check the previous state is not terminal states, is not error states, when we meet
error states, we should have already increase the count, we do not want to do it again
      if(previousState != JobStateInternal.SUCCEEDED || previousState != JobStateInternal.KILLED
|| previousState != JobStateInternal.FAILED || previousState != JobStateInternal.ERROR)
      {
    	  job.finished(JobStateInternal.ERROR);
      }
    }
                
> AM job metrics can double-count a job if it errors after entering a completion state
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4835
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4835
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.6
>            Reporter: Jason Lowe
>            Priority: Minor
>
> If JobImpl enters the SUCCEEDED, FAILED, or KILLED state but then encounters an invalid
state transition, it could double-count the job since jobs that encounter an error are considered
failed jobs.  Therefore the job could be counted initially as a successful, failed, or killed
job, respectively, then counted again as a failed job due to the internal error afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message