hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-492) Global counters
Date Thu, 15 Feb 2007 18:09:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473451
] 

Doug Cutting commented on HADOOP-492:
-------------------------------------

I talked to Owen about this last week.  My concerns are:

1. We should only instrument code once, for counters and for monitoring metrics.

2. Users should be able to easily add new counters & metrics to their code that are visible
in the JobTracker web ui and/or a separate metrics monitoring system.

3. Counters should be accessible programatically through JobClient.

One way to implement this would be to implement counters through the metrics API, as I've
promoted above.  Another approach would be to add a new counter-only API (a subset of metrics
features) that routes values to the jobtracker, and can also be configured to talk to the
metrics system.  Then user code can decide whether to use the metrics API directly (for non-counter
metrics) or use the counter-only API, and get the benefit of the JobTracker-based aggregation,
built into the MapReduce runtime.  I don't have a strong preference about which implementation
strategy is pursued.

> Global counters
> ---------------
>
>                 Key: HADOOP-492
>                 URL: https://issues.apache.org/jira/browse/HADOOP-492
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: arkady borkovsky
>         Assigned To: David Bowen
>
> It would be nice to have map / reduce job keep aggregated counts for arbitrary events
occuring in its tasks -- the numer of records processed, the numer of exceptions of a specific
type, the number of sentences in passive voice, whatever the jobs finds useful.
> This can be implemented by tasks periodically sending <name, value> pairs to the
jobtracker (in some implementations such messages are piggy-backed on the heartbeats), so
that the job tracker stores all the latests values from each task and aggregates them on a
request.  It should also make the aggregated values available at the job end.  The value for
a task would be flushed when the task fails.
> #491 and #490 may be related to this one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message