hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces
Date Thu, 16 May 2013 17:07:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659714#comment-13659714
] 

Sandy Ryza commented on MAPREDUCE-4366:
---------------------------------------

Thanks delving into this with me Arun.  First, please excuse in advance any errors I'm about
to make here.  Trying to be careful, but the counting code is subtle and has been hard to
think about.

bq. An option is to just call decWaiting(Maps|Reduces) in JIP.garbageCollect with JIP.num(Maps|Reduces)...
currently if you follow the opposite side i.e addWaiting(Maps|Reduces), they are just static
and are done at JIP.initTasks with num(Maps|Reduces). That would solve the immediate problem
at hand?

Waiting maps and reduces are updated in the job tracker metrics every time that a task is
launched is fails/completes, so this would not work unless I am missing something.

bq. The definition of speculative(Map|Reduce)Tasks, at least in my head, has been the number
of task-attempts have an alternate...

This definition can lead to thinking there are fewer pending tasks than there actually are.
 Consider the following situation:
My job has two maps.  Attempts are run for both of them.  One map gets a speculative attempt
because it's running slow.  The other map's attempt fails.  The speculative one completes.
 initialMaps=2 + speculativeMaps=0 - runningMaps=1 - finishedMaps=1 - failedMaps=0.  So pendingMaps
is now 0 even though we have a pending map task.  The way this has not caused jobs to starve
is that the running speculative map will fail later on and bring pendingMaps back up to 1.

Wanted to make sure it was clear that the current behavior is wrong in an objective way. 
If your stance is still that the code has been working so far and messing with it is just
a bad idea, I trust your experience.  In that case, we could keep speculativeMapTasks how
it is and have a separate variable, nonCriticalRunningTasks, that is used for updating the
metrics?
                
> mapred metrics shows negative count of waiting maps and reduces
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4366
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Thomas Graves
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4366-branch-1-1.patch, MAPREDUCE-4366-branch-1.patch
>
>
> Negative waiting_maps and waiting_reduces count is observed in the mapred metrics.  MAPREDUCE-1238
partially fixed this but it appears there is still issues as we are seeing it, but not as
bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message