hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "TestingNov2009" by SteveLoughran
Date Sun, 22 Nov 2009 12:24:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "TestingNov2009" page has been changed by SteveLoughran.
The comment on this change is: Added stuff on OS QAing, some ideas at dealing with the problems.


  There are currently no tests that work with Hadoop via the web pages, no job submission
and monitoring. It is in fact possible to bring up a Hadoop cluster in which JSP doesn't work,
but the basic tests all appear well -even including TeraSort, provided you use the low-level
- Options
+ ''Proposals:''
   * Create a set of JUnit/HtmlUnit tests that test the GUI; design these to run against any
host. Either check out the source tree and run the against a remote cluster, or package the
tests in a JAR and make this a project distributable. 
  * We may need separate test JARs for HDFS and mapreduce.
@@ -63, +64 @@

   * For testing local Hadoop builds on IaaS platforms, the build process needs to scp over
and install the Hadoop binaries and the configuration files. This can be done by creating
a new disk image that is then used to bootstrap every node, or you start with a base clean
image and copy in Hadoop on demand. The latter is much more agile and cost effective during
iterative development, but doesn't scale to very-large clusters (1000s of machines), unless
you delegate the task of copy/install to the first few tens of allocated machines. For EC2,
one tactic is to upload the binaries to S3, and have scripts on the nodes to copy down and
install the files.
+ See: [[http://www.netkernel.org/blogxter/entry?publicid=12CE2B62F71239349F3E9903EAE9D1F0
| A Cloud Tools Manifesto]]
+ == Qualifying Hadoop on different platforms ==
+ Currently Hadoop is only used at scale on RHEL + Sun JVM, because that is what Yahoo! run
their clusters on, and nobody else is running different platforms in their production clusters
-or if they are, they aren't discussing it in public.
+  * It would be interesting to start collecting experiences with running Hadoop on other
platforms -different Unix flavours in particular, even if this is not a formal pre-release
+  * Windows and OS/X support Hadoop, reluctantly, with Windows being the most reluctant.
Nobody admits to using Windows in production, and it may not get tested at any serious scale
before a release is made. 
+ What would it take to test Hadoop releases on different operating systems? We'd need clusters
of real or virtual machines and then run any cluster qualification tests on them; publish
the results. This would not be a performance game; throughput isn't important, it's more "does
this work on a specific OS at 100+ machines"?
  == Exploring the Hadoop Configuration Space ==
  There are a lot of Hadoop configuration options, even ignoring those of the underlying machines
and network. For example, what impact does blocksize and replication factor have on your workload?
What different network card configuration parameters give the best performance? Which combinations
of options break things?
@@ -73, +86 @@

   * There is existing work on automated configuration testing, notably the work done by Adam
Porter and colleagues on [[http://www.youtube.com/watch?v=r0nn40O3mCY | Distributed Continuous
Quality Assurance]]
   * (Steve says) in HP we've used a Pseudo-RNG to drive transforms to the infrastructure
and deployed applications, this explores some of the space and is somewhat replicable.
+ ''Proposal:'' Make this a research topic, pull in the experts in testing, and give encouragement
to work on this problem. Offering cluster time may help.
  == Testing applications that run on Hadoop ==
@@ -97, +112 @@

   * Network failures can be simulated on some IaaS platforms just by breaking a virtual link
   * Forcibly killing processes is a more realistic approach which works on most platforms,
though it is hard to choreograph
+ ''Proposal:'' Any Infrastructure API ought should offer the opportunity to simulate failures,
either by turning off nodes without warning, or (better) breaking the network connections
between live nodes.

View raw message