hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3062) Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce shuffling and break them down by racks
Date Thu, 07 Aug 2008 02:45:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated HADOOP-3062:
----------------------------------

    Attachment: 3062-0.patch

First draft.

Format:
{noformat}
<log4j schema including timestamp, etc.> src: <src IP>, dest: <dst IP>,
bytes: <bytes>, op: <op enum>, id: <DFSClient id|taskid>[, blockid: <block
id>] 
{noformat}

The patch adds the DFSClient clientName to OP_READ_BLOCK and changes the String in OP_WRITE_BLOCK
from the path- which is unused- to the clientName. Is this is set to DFSClient_<taskid>
in map and reduce tasks, tracing the output of a job should be straightforward after some
processing of each entry. Writes for replications (where the clientName is "") are logged
as they have been; the logging in PacketResponder has been reformatted to fit the preceding
schema. A few known issues:

* The logging assumes the IP address is sufficient to distinguish a source, particularly for
writes and in the shuffle
* This logs to the DataNode and ReduceTask appenders; these entries should be directed elsewhere
and disabled by default
* In testing this, some entries in the read exhibited a strange property: the source and destination
match, but neither matches the DataNode on which it is logged. I'm clearly missing something.

I tried tracing a few blocks and map outputs through the logs and all made sense. That said-
as mentioned in the last bullet- not all of the entries made sense.

> Need to capture the metrics for the network ios generate by dfs reads/writes and map/reduce
shuffling  and break them down by racks 
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3062
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3062
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Runping Qi
>         Attachments: 3062-0.patch
>
>
> In order to better understand the relationship between hadoop performance and the network
bandwidth, we need to know 
> what the aggregated traffic data in a cluster and its breakdown by racks. With these
data, we can determine whether the network 
> bandwidth is the bottleneck when certain jobs are running on a cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message