hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Smith <psm...@aconex.com>
Subject 3D Cluster Performance Visualization
Date Fri, 25 Sep 2009 07:04:38 GMT

I'm still relatively new to Hadoop here, so bear with me.  We have a  
few ex-SGI staff with us, and one of the tools we now use at Aconex is  
Performance Co-Pilot (PCP), which is an open-source Performance  
Monitoring suite out of Silicon Graphics (see [1]).  SGI are a bit  
fond of large-scale problems and this toolset was built to support  
their own monster computers (see [2] for one of their clients, yep,  
that's one large single computer), and PCP was used to monitor and  
tune that, so I'm pretty confident it has the credentials to help with  

Aconex has built a Java bridge to PCP and has open-sourced that as  
Parfait (see [3]).  We rely on this for real-time and post-problem  
retrospective analysis.  We would be dead in the water without it.  By  
being able to combine hardware and software metrics across multiple  
machines into a single warehouse of data we can correlate many  
interesting things and solve problems very quickly.

Now I want to unleash this on Hadoop.  I have written a MetricContext  
extension that uses the bridge, and I can export counters and values  
to PCP for the namenode, datanode, jobtracker and tasktracker.  We are  
building some small tool extensions to allow 3D visualization.  First  
fledgling view of what it looks like is here:


Yes, a pretty trivial cluster at the moment, but the toolset allows  
pretty simple configurations to create the cluster by passing it the  
masters/slaves file.  Once PCP tools connects to each node through my  
implementation of PCP Metric Context it can find out whether it's a  
namenode, or a jobtracker etc and display it differently.  We hope to  
improve on the tools to utilise the DNSToSwitchMapping style to then  
visualize all the nodes within the cluster as they would appear in the  
rack.  PCP already has support for Cisco switches so we can also  
integrate those into the picture and display inter-rack networking  
volumes.  The real payoff here is the retrospective analysis, all this  
PCP data is collected into Archives so this view can be replayed at  
any time, and at any pace you want.  Very interesting problems are  
found when you have that sort of tool.

I guess my question is whether anyone else thinks this is going to be  
of value to the wider Hadoop community?  Obviously we do, but we're  
not exactly stretching Hadoop just yet, nor do we fully understand  
some of the tricky performance problems large Hadoop cluster admins  
face.  I think we'd love to think we could add this to the hadoop- 
contrib though hoping others might find it useful.

So if anyone is interested in asking questions or suggesting crucial  
feature sets we'd appreciate it.

cheers (and thanks for getting this far in the email.. :) )

Paul Smith
psmith at aconex.com
psmith at apache.org

[1] Performance Co-Pilot (PCP)

[2] NASAs 'Columbia' computer

[3] Parfait

View raw message