Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 88908 invoked from network); 2 May 2008 12:28:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 May 2008 12:28:57 -0000 Received: (qmail 70280 invoked by uid 500); 2 May 2008 12:28:55 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 70236 invoked by uid 500); 2 May 2008 12:28:55 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 70225 invoked by uid 99); 2 May 2008 12:28:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2008 05:28:55 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of k.honsali@gmail.com designates 64.233.184.226 as permitted sender) Received: from [64.233.184.226] (HELO wr-out-0506.google.com) (64.233.184.226) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2008 12:28:10 +0000 Received: by wr-out-0506.google.com with SMTP id c53so847509wra.20 for ; Fri, 02 May 2008 05:28:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=H1BuKJP4Z0+jVPWwYi4sT4PI1SEPjWxOxC25OhegTFk=; b=mYqQZ0aVA1mno70HZ6aE52tFDyimyWzmAHgYoSU8ReqD2Aqdlww+vIcY5tOXQ+Wbf9cnsT48V6Udm8G9wFRNPG4jA6hWAt0ed2bhkj2X/MiSMw37YA0j2/TzjI1prKRXgWXUKnFv/1a0TIafV8zGZ+iI/GAjb3ErcbL89IdX47Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=W78ruaaEbrLn3uH4uFJeXHxmXRURxaX81xagwHx5MG+3IwGa6yQ1m8pvXNZ+8esDNVR1I0FOSbf6qx9n0QW0W+d/NMCqysaxef28tQ9nqC+LbjIK850lEhs7VdiiCMdd+gGOF0LdKefrjXNpPQxwm0iVyuVbQbwr/DkyW+/a/OY= Received: by 10.142.81.7 with SMTP id e7mr1229720wfb.320.1209731303309; Fri, 02 May 2008 05:28:23 -0700 (PDT) Received: by 10.142.217.4 with HTTP; Fri, 2 May 2008 05:28:23 -0700 (PDT) Message-ID: <583355c00805020528n5dce04fevaea445ca12e80e62@mail.gmail.com> Date: Fri, 2 May 2008 21:28:23 +0900 From: "Khalil Honsali" To: core-user@hadoop.apache.org Subject: Re: Hadoop Cluster Administration Tools? In-Reply-To: <481AF7E1.2090903@apache.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_9803_24822952.1209731303291" References: <481AF7E1.2090903@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_9803_24822952.1209731303291 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline useful information indeed, though a bit complicated for my level I must say I think it is more than useful to post these online, say maybe in Hadoop's wiki or as an article on cluster resource sites.. How about it? I can volunteer for this if you wish, a central information place on the hadoop wiki for pre-install clusters admin? - OS image install - ssh setup - dsh ant tools setup - rpm automation - this.next( ? ) 2008/5/2 Steve Loughran : > Allen Wittenauer wrote: > > > On 5/1/08 5:00 PM, "Bradford Stephens" > > wrote: > > > > > *Very* cool information. As someone who's leading the transition to > > > open-source and cluster-orientation at a company of about 50 people, > > > finding good tools for the IT staff to use is essential. Thanks so > > > much for > > > the continued feedback. > > > > > > > Hmm. I should upload my slides. > > > > > > > That would be excellent! I was trying not to scare people with things like > PXE preboot or the challenge of bringing up a farm of 500+ servers when the > building has just suffered a power outage. I will let your slides do that. > > The key things people have to remember are > -you can't do stuff by hand once you have more than one box; you need to > have some story for scaling things up. It could be hand creating some > machine image that is cloned, it could be using CM tools. If you find > yourself trying to ssh in to boxes to configure them by hand, you are in > trouble > > -once you have enough racks in your cluster, you can abandon any notion of > 100% availability. You have to have be prepared to deal with the failures as > an everyday event. The worst failures are not the machines that drop off the > net, its the ones that start misbehaving with memory corruption or a network > card that starts flooding the network,. > > -- ------=_Part_9803_24822952.1209731303291--