hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Porter (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4049) Cross-system causal tracing within Hadoop
Date Sun, 14 Sep 2008 23:25:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

George Porter updated HADOOP-4049:
----------------------------------

    Attachment: HADOOP-4049.patch

This patch is an implementation of the instrumentation API for the RPC layer.

It includes an abstract instrumentation class for the RPC layer, as well as two concrete implementations:
a "null" implementation that does nothing, and a "test" implementation that is used for unit
testing.  An X-Trace implementation of the API will be attached shortly.

Each successful RPC call activates the following four instrumentation points in order:
  1) clientStartCall()
  2) serverReceiveCall()
  3) serverSendResponse()
  4) clientReceiveResponse()

An instrumentation point can set "path state" using setPathState().  This state follows the
RPC call and is available to the remaining instrumentation points via getPathState().

There are also two instrumentation points for erroneous conditions:  remoteException() is
activated if the code running on the server throws an exception, and ipcFailure() is called
if there is an underlying failure in the network causing RPC to fail.

> Cross-system causal tracing within Hadoop
> -----------------------------------------
>
>                 Key: HADOOP-4049
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4049
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, ipc, mapred
>            Reporter: George Porter
>         Attachments: HADOOP-4049.patch, multiblockread.png, multiblockwrite.png
>
>
> Much of Hadoop's behavior is client-driven, with clients responsible for contacting individual
datanodes to read and write data, as well as dividing up work for map and reduce tasks.  In
a large deployment with many concurrent users, identifying the effects of individual clients
on the infrastructure is a challenge.  The use of data pipelining in HDFS and Map/Reduce make
it hard to follow the effects of a given client request through the system.
> This proposal is to instrument the HDFS, IPC, and Map/Reduce layers of Hadoop with X-Trace.
 X-Trace is an open-source framework for capturing causality of events in a distributed system.
 It can correlate operations making up a single user request, even if those operations span
multiple machines.  As an example, you could use X-Trace to follow an HDFS write operation
as it is pipelined through intermediate nodes.  Additionally, you could trace a single Map/Reduce
job and see how it is decomposed into lower-layer HDFS operations.
> Matei Zaharia and Andy Konwinski initially integrated X-Trace with a local copy of the
0.14 release, and I've brought that code up to release 0.17.  Performing the integration involves
modifying the IPC protocol, inter-datanode protocol, and some data structures in the map/reduce
layer to include 20-byte long tracing metadata.  With release 0.18, the generated traces could
be collected with Chukwa.
> I've attached some example traces of HDFS and IPC layers from the 0.17 patch to this
JIRA issue.
> More information about X-Trace is available from http://www.x-trace.net/ as well as in
a paper that appeared at NSDI 2007, available online at http://www.usenix.org/events/nsdi07/tech/fonseca.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message