hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2125) Put map-reduce framework counters to JobTrackerMetricsInst
Date Wed, 03 Nov 2010 17:50:32 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927918#action_12927918
] 

Scott Chen commented on MAPREDUCE-2125:
---------------------------------------

Luke: I added intern() in for the filesystem counter. JobCounter and TaskCounter use enum.
They are OK.
Thanks for the suggestion.
{code}
      countersToMetrics.incrCounter(Task.FILESYSTEM_COUNTER_GROUP,
          counter.getName().intern(), counter.getValue());
{code}

I think the case you are talking about is bad for everything. Both obtainNew*Task() and initTasks()
will be very expensive because they can be O( n) in the worst case with respect to number
of tasks. In fact, these operations will be much more expansive than doing one getJobCounter().
Like you said, this should be fixed with using combined input format.
{quote}
The problem is not small jobs but short tasks in jobs with large amount of tasks. We happened
to have certain system that generates jobs with 50k to 100k tasks per job, that only have
a few MB per split, if you have multiple such jobs in different queues (or any shared scheduler
that's not strictly FIFO), you can have high job completion rate for these large jobs after
a while. Arguably, these jobs can be optimized to use proper input format to use less splits
(hence less tasks) but I'd like to point out that such work load exists.
{quote}
I think adding one getJobCounter() for the entire lifecycle of a job should be allowed. If
this is not doable, why should we need this method getJobCounter()?


> Put map-reduce framework counters to JobTrackerMetricsInst
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2125
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2125
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-2125-v2.txt, MAPREDUCE-2125.txt
>
>
> We have lots of useful information in the framework counters including #spills, filesystem
read and write.
> It will be nice to put them all in the jobtracker metrics to get a global view of all
these numbers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message