hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "LargeClusterTips" by SteveLoughran
Date Wed, 24 Jun 2009 10:18:16 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/LargeClusterTips

The comment on the change is:
formatting

------------------------------------------------------------------------------
  
  Things will go wrong. There is always SPOF. Test your failure handling processes before
you go live. 
  
- * Simulate a corrupted edit log by killing the namenode process, truncating the (binary)
edit log, and bringing it up. See how the team handles it. 
+  * Simulate a corrupted edit log by killing the namenode process, truncating the (binary)
edit log, and bringing it up. See how the team handles it. 
- * Turn off one of the switches, pull out a network cable. See how the cluster handles it,
how it recovers. Then put the switch back on.
+  * Turn off one of the switches, pull out a network cable. See how the cluster handles it,
how it recovers. Then put the switch back on.
- * Turn an entire rack off without warning. See what happens when they go offline.
+  * Turn an entire rack off without warning. See what happens when they go offline.
- * Turn off DNS. 
+  * Turn off DNS. Or just the rDNS side of things.
- * Turn off the entire datacenter, switch it back on. Are there any race conditions?
+  * Turn off the entire datacenter, switch it back on. Are there any race conditions?
- * Write an job that tries to generate too much data, fills up the cluster. How is it handled?
+  * Write an job that tries to generate too much data, fills up the cluster. How is it handled?
  

Mime
View raw message