hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "BristolHadoopWorkshop" by SteveLoughran
Date Wed, 12 Aug 2009 15:49:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:

The comment on the change is:
creating a page containing writeup of the event, adding entries as time permits

New page:
= Bristol Hadoop Workshop =

This was a little local workshop put together by Simon Metson of Bristol University, and Steve
Loughran of HP, to get some of the local Hadoop users in a room and talk about our ongoing

These presentations were intended to start discussion and thought
  * [http://www.slideshare.net/steve_l/hadoop-futures Hadoop Futures] (Tom White, Cloudera)
  * [http://www.slideshare.net/steve_l/hadoop-hep Hadoop and High-Energy Physics] (Simon Metson,
Bristol University)
  * [http://www.slideshare.net/steve_l/hdfs HDFS] (Johan Oskarsson, Last.fm)
  * [http://www.slideshare.net/steve_l/graphs-1848617 Graphs] Paolo Castagna, HP
  * [http://www.slideshare.net/steve_l/long-haul-hadoop Long Haul Hadoop] (Steve Loughran,
  * [http://www.slideshare.net/steve_l/benchmarking-1840029 Benchmarking Hadoop] (Steve Loughran
& Julio Guijarro, HP)

== Benchmarking ==

[:Terasort: Terasort], while a good way of regression testing performance across Hadoop versions,
isn't ideal for assessing which hardware is best for other algorithms than sort, because things
that are more iterative and CPU/memory hungry may not behave as expected on a cluster which
has good IO, but not enough RAM for their algorithm.

In the discussion, though, it became clear that a common need people have that isn't that
well address right now -and for which terasort is the best that people have to date- is QA-ing
a new cluster.

Here you have new hardware -any of which failing is an immediate replacement call to the vendor-
on a new network -which may not be configured right- and with a new set of configuration parameters
-all of which may be wrong or at least suboptimal. You need something to run on the cluster
which tests every node, makes sure it can see every other node's services, and report problems
in meaningful summaries. The work should test CPU, FPU and RAM too, just to make sure they
are all valid, and at the end of the run, generate some test numbers that can be compared
to a spreadsheet-calculated estimate of performance and throughput.

When you bring up a cluster, even if every service has been asked to see if it is healthy,
they still have the problem of talking to everything. The best check: push work through the
system. Wait for things to fail, try and guess the problem. Having work to push through that
is designed to stress the system's interconnected -and whose failure can be diagnosed with
ease- would be nice.

That is, for all those people asking for a !HappyHadoop JPS page, it isn't enough. A cluster
may cope with some of the workers going down, but it is not actually functional unless every
node that is up can talk to every other node that is up, that nothing is coming up listening
on IPv6, that the TaskTracker hasn't decided to only run on localhost, etc. etc.

View raw message