hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luke Lu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure
Date Mon, 04 Oct 2010 23:45:41 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917821#action_12917821
] 

Luke Lu commented on MAPREDUCE-901:
-----------------------------------

The latest patch already handles JobCounter and TaskCounter optimization (with the generic
FrameworkCounterGroup) transparently. But it doesn't address file system counter optimization
yet. However using concrete fs enums (hdfs, s3 etc.) like in the previous patches is too brittle,
as the whole mapreduce package needs to be recompiled/released for every new implementation
of distributed filesystem, which defeats the purpose of having a filesystem interface, where
we can already query for (fs scheme, stats) tuples. HADOOP-4188 tried to address the issue
but the treatment is incomplete: the Task#getFileSystemCounters helper method is package private
and quite awkward to use: requires explict array indexing, e.g. getFileSystemCounters(scheme)[0]
to return &lt;SCHEME&gt;_BYTES_READ (e.g. HDFS_BYTES_READ) to use with the generic
counter interface. This also makes decoupled file system counter display name localization
impossible.

I propose that we add a file system counter API to the Counters framework. Something like:
{code}
Counter getFileSystemCounter(String scheme, FileSystemCounter key);
{code}

where FileSystemCounter is an enum class:
{code}
public enum FileSystemCounter {
  BYTES_READ,
  BYTES_WRITTEN
  // etc.
}
{code}

We can take advantage of this interface to create an efficient file system counter group that
can be more efficiently stored in memory and serialized (say: (&lt;scheme&gt;, vint(BYTES_READ),
vint(BYTES_WRITTEN)...) tuples)

Thoughts?

> Move Framework Counters into a TaskMetric structure
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-901
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Owen O'Malley
>            Assignee: Luke Lu
>         Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, MAPREDUCE-901.patch,
MAPREDUCE-901.patch, mr-901-trunk-v1.patch
>
>
> I think we should move all of the Counters that the framework updates into a single class
called TaskMetrics. TaskMetrics would have specific fields for each of the metrics like input
records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the Counters
down to just the user's counters) and decrease the latency for updates to the JobTracker (since
Counters are sent at most 1/minute instead of 1/heartbeat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message