hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: 3D Cluster Performance Visualization
Date Fri, 25 Sep 2009 15:09:05 GMT
On Fri, Sep 25, 2009 at 10:06 AM, Brian Bockelman <bbockelm@cse.unl.edu> wrote:
> ;) Unfortunately, I'm going to go out on a limb and guess that we don't want
> to add OpenGL to the dependency list for the namenode...  The viz
> application actually doesn't depend on the namenode, it uses the datanodes.
> Here's the source:
> svn://t2.unl.edu/brian/HadoopViz/trunk
> The server portion is a bit hardcoded to our site (simply a python server);
> the client application is pretty cross-platform.  I actually compile and
> display the application on my Mac.
> Here's how it works:
> 1) Client issues read() request
> 2) Datanode services it.  Logs it with log4j
> 3) One of the log4j appenders is syslog pointing at a separate server
> 4) Separate log server recieves UDP packets; one packet per read()
> 5) Log server parses packets and decides whether they are within the cluster
> or going to the internet
>  - Currently a Pentium 4 throw-away machine; handles up to 4-5k packets per
> second before it starts dropping
> 6) Each client opens a TCP stream to the server and receives the transfer
> type, source, and dest, then renders appropriately
> It's pretty danged close to real-time; the time the client issues the read()
> request to seeing something plotted is on the order of 1 second.
> I'd really like to see this on a big (Yahoo, Facebook, any takers?) cluster.
> Brian
> On Sep 25, 2009, at 8:54 AM, Edward Capriolo wrote:
>> On Fri, Sep 25, 2009 at 9:25 AM, Brian Bockelman <bbockelm@cse.unl.edu>
>> wrote:
>>> Hey Paul,
>>> Here's another visualization one can do with HDFS:
>>> http://www.youtube.com/watch?v=qoBoEzOkeDQ
>>> Each time data is moved from one host to another, it is plotted as a drop
>>> of
>>> water from one square representing the host to one square representing
>>> the
>>> destination.  The color of the node's square depends on the number of
>>> transfers per second.  Data transferred out of the cluster is represented
>>> by
>>> drops going in/out of the ceiling.
>>> Hard to describe, easy to understand when you see it.  Absolutely
>>> mesmerizing for tour groups when you put it on a big-screen.
>>> Brian
>>> On Sep 25, 2009, at 2:04 AM, Paul Smith wrote:
>>>> Hi,
>>>> I'm still relatively new to Hadoop here, so bear with me.  We have a few
>>>> ex-SGI staff with us, and one of the tools we now use at Aconex is
>>>> Performance Co-Pilot (PCP), which is an open-source Performance
>>>> Monitoring
>>>> suite out of Silicon Graphics (see [1]).  SGI are a bit fond of
>>>> large-scale
>>>> problems and this toolset was built to support their own monster
>>>> computers
>>>> (see [2] for one of their clients, yep, that's one large single
>>>> computer),
>>>> and PCP was used to monitor and tune that, so I'm pretty confident it
>>>> has
>>>> the credentials to help with Hadoop.
>>>> Aconex has built a Java bridge to PCP and has open-sourced that as
>>>> Parfait
>>>> (see [3]).  We rely on this for real-time and post-problem retrospective
>>>> analysis.  We would be dead in the water without it.  By being able to
>>>> combine hardware and software metrics across multiple machines into a
>>>> single
>>>> warehouse of data we can correlate many interesting things and solve
>>>> problems very quickly.
>>>> Now I want to unleash this on Hadoop.  I have written a MetricContext
>>>> extension that uses the bridge, and I can export counters and values to
>>>> PCP
>>>> for the namenode, datanode, jobtracker and tasktracker.  We are building
>>>> some small tool extensions to allow 3D visualization.  First fledgling
>>>> view
>>>> of what it looks like is here:
>>>> http://people.apache.org/~psmith/clustervis.png
>>>> Yes, a pretty trivial cluster at the moment, but the toolset allows
>>>> pretty
>>>> simple configurations to create the cluster by passing it the
>>>> masters/slaves
>>>> file.  Once PCP tools connects to each node through my implementation of
>>>> PCP
>>>> Metric Context it can find out whether it's a namenode, or a jobtracker
>>>> etc
>>>> and display it differently.  We hope to improve on the tools to utilise
>>>> the
>>>> DNSToSwitchMapping style to then visualize all the nodes within the
>>>> cluster
>>>> as they would appear in the rack.  PCP already has support for Cisco
>>>> switches so we can also integrate those into the picture and display
>>>> inter-rack networking volumes.  The real payoff here is the
>>>> retrospective
>>>> analysis, all this PCP data is collected into Archives so this view can
>>>> be
>>>> replayed at any time, and at any pace you want.  Very interesting
>>>> problems
>>>> are found when you have that sort of tool.
>>>> I guess my question is whether anyone else thinks this is going to be of
>>>> value to the wider Hadoop community?  Obviously we do, but we're not
>>>> exactly
>>>> stretching Hadoop just yet, nor do we fully understand some of the
>>>> tricky
>>>> performance problems large Hadoop cluster admins face.  I think we'd
>>>> love to
>>>> think we could add this to the hadoop-contrib though hoping others might
>>>> find it useful.
>>>> So if anyone is interested in asking questions or suggesting crucial
>>>> feature sets we'd appreciate it.
>>>> cheers (and thanks for getting this far in the email.. :) )
>>>> Paul Smith
>>>> psmith at aconex.com
>>>> psmith at apache.org
>>>> [1] Performance Co-Pilot (PCP)
>>>> http://oss.sgi.com/projects/pcp/index.html
>>>> [2] NASAs 'Columbia' computer
>>>> http://www.nas.nasa.gov/News/Images/images.html
>>>> [3] Parfait
>>>> http://code.google.com/p/parfait/
>> Open up a Jira. Lets get hadoop viz on the name node web interface for
>> real time :)


I was half kidding but if you can do it with open GL you can probably
do it with an applet of course as you mentioned it would take access
to the logging source. However maybe run it on each DataNode web
interface. Also a while back people were talking about those map
reduce job status graphs that who the map/reduce over the course of a
job. That is something I think we could do right from the job tracker
interface. There is a lot of info there we should be able to jazz it
up a bit :)


View raw message