hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2124) Add job counters for measuring time spent in three different phases in reducers
Date Fri, 29 Oct 2010 20:36:24 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926444#action_12926444
] 

Chris Douglas commented on MAPREDUCE-2124:
------------------------------------------

Either way is OK with me, but I'm not wholly clear on its intended audience. The SLOT_MILLIS_\*
counters are useful to operators and developers, as they provide information about the efficiency
of the scheduler: they're useful for bottleneck analysis of repeated sets of jobs, tuning
of the aggregate cluster, and comparing different runs of concurrent pipelines. By only accumulating
the time that was actually spent doing work, the proposed counters could measure the efficiency
of the job and be useful to the user, for tuning parameters like slowstart (a long shuffle
time for small amounts of intermediate data might indicate that the job is scheduling reduces
too early).

Most of the framework counters (FileSystem, framework bytes and records) provide feedback
to the user, to help determine if their job is written correctly and tuned efficiently. This
is slightly different, because it's not a property of a particular MapReduce job (e.g. a job
where every reduce fails once could look "efficient" by this metric). I guess my question
would be: if this information is presented in every user job, then how should (s)he react
to it? If it's not user-centric and only another presentation of data the operator already
has, then it seems less motivated to me. All that said, the cost is low, so if you feel it's
useful then I've no objection to it.

> Add job counters for measuring time spent in three different phases in reducers
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2124
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2124-v2.txt, MAPREDUCE-2124.txt
>
>
> We currently have SLOTS_MILLIS_REDUCES which measures the total slot time of reducer.
> It will be useful if we have
> {code}
> SLOTS_MILLIS_REDUCES_COPY
> SLOTS_MILLIS_REDUCES_SORT
> SLOTS_MILLIS_REDUCES_REDUCE
> {code}
> which measures three different phases of a reducer.
> This will help us identify the bottleneck of the reducers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message