hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Lu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2125) Put map-reduce framework counters to JobTrackerMetricsInst
Date Tue, 02 Nov 2010 17:56:26 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927511#action_12927511

Luke Lu commented on MAPREDUCE-2125:

bq. If the use case is for many small jobs, each getCounter() call will be cheap. So in this
case it will still be OK. I think the key here is that this change only add one more look
at all task counter. So from the throughput point of view, it is not that large. 

The problem is not small jobs but short tasks in jobs with large amount of tasks. We happened
to have certain system that generates jobs with 50k to 100k tasks per job, that only have
a few MB per split, if you have multiple such jobs in different queues (or any shared scheduler
that's not strictly FIFO), you can have high job completion rate for these large jobs after
a while. Arguably, these jobs can be optimized to use proper input format to use less splits
(hence less tasks) but I'd like to point out that such work load exists.

bq. Our job completion rate is about 20 jobs/minute in average.

OK, you guys have well behaved jobs ;)

Another issue with the patch, the metrics names are regenerated on every update, which is
wasteful. For these system counters you can use a simple cache to generate these metrics names
only once and produce no additional garbage in updates.

> Put map-reduce framework counters to JobTrackerMetricsInst
> ----------------------------------------------------------
>                 Key: MAPREDUCE-2125
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2125
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>         Attachments: MAPREDUCE-2125.txt
> We have lots of useful information in the framework counters including #spills, filesystem
read and write.
> It will be nice to put them all in the jobtracker metrics to get a global view of all
these numbers.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message