hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: 3D Cluster Performance Visualization
Date Fri, 25 Sep 2009 13:54:37 GMT
On Fri, Sep 25, 2009 at 9:25 AM, Brian Bockelman <bbockelm@cse.unl.edu> wrote:
> Hey Paul,
> Here's another visualization one can do with HDFS:
> http://www.youtube.com/watch?v=qoBoEzOkeDQ
> Each time data is moved from one host to another, it is plotted as a drop of
> water from one square representing the host to one square representing the
> destination.  The color of the node's square depends on the number of
> transfers per second.  Data transferred out of the cluster is represented by
> drops going in/out of the ceiling.
> Hard to describe, easy to understand when you see it.  Absolutely
> mesmerizing for tour groups when you put it on a big-screen.
> Brian
> On Sep 25, 2009, at 2:04 AM, Paul Smith wrote:
>> Hi,
>> I'm still relatively new to Hadoop here, so bear with me.  We have a few
>> ex-SGI staff with us, and one of the tools we now use at Aconex is
>> Performance Co-Pilot (PCP), which is an open-source Performance Monitoring
>> suite out of Silicon Graphics (see [1]).  SGI are a bit fond of large-scale
>> problems and this toolset was built to support their own monster computers
>> (see [2] for one of their clients, yep, that's one large single computer),
>> and PCP was used to monitor and tune that, so I'm pretty confident it has
>> the credentials to help with Hadoop.
>> Aconex has built a Java bridge to PCP and has open-sourced that as Parfait
>> (see [3]).  We rely on this for real-time and post-problem retrospective
>> analysis.  We would be dead in the water without it.  By being able to
>> combine hardware and software metrics across multiple machines into a single
>> warehouse of data we can correlate many interesting things and solve
>> problems very quickly.
>> Now I want to unleash this on Hadoop.  I have written a MetricContext
>> extension that uses the bridge, and I can export counters and values to PCP
>> for the namenode, datanode, jobtracker and tasktracker.  We are building
>> some small tool extensions to allow 3D visualization.  First fledgling view
>> of what it looks like is here:
>> http://people.apache.org/~psmith/clustervis.png
>> Yes, a pretty trivial cluster at the moment, but the toolset allows pretty
>> simple configurations to create the cluster by passing it the masters/slaves
>> file.  Once PCP tools connects to each node through my implementation of PCP
>> Metric Context it can find out whether it's a namenode, or a jobtracker etc
>> and display it differently.  We hope to improve on the tools to utilise the
>> DNSToSwitchMapping style to then visualize all the nodes within the cluster
>> as they would appear in the rack.  PCP already has support for Cisco
>> switches so we can also integrate those into the picture and display
>> inter-rack networking volumes.  The real payoff here is the retrospective
>> analysis, all this PCP data is collected into Archives so this view can be
>> replayed at any time, and at any pace you want.  Very interesting problems
>> are found when you have that sort of tool.
>> I guess my question is whether anyone else thinks this is going to be of
>> value to the wider Hadoop community?  Obviously we do, but we're not exactly
>> stretching Hadoop just yet, nor do we fully understand some of the tricky
>> performance problems large Hadoop cluster admins face.  I think we'd love to
>> think we could add this to the hadoop-contrib though hoping others might
>> find it useful.
>> So if anyone is interested in asking questions or suggesting crucial
>> feature sets we'd appreciate it.
>> cheers (and thanks for getting this far in the email.. :) )
>> Paul Smith
>> psmith at aconex.com
>> psmith at apache.org
>> [1] Performance Co-Pilot (PCP)
>> http://oss.sgi.com/projects/pcp/index.html
>> [2] NASAs 'Columbia' computer
>> http://www.nas.nasa.gov/News/Images/images.html
>> [3] Parfait
>> http://code.google.com/p/parfait/

Open up a Jira. Lets get hadoop viz on the name node web interface for
real time :)

View raw message