Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 7652 invoked from network); 12 Mar 2010 20:27:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Mar 2010 20:27:41 -0000 Received: (qmail 31534 invoked by uid 500); 12 Mar 2010 20:27:02 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 31500 invoked by uid 500); 12 Mar 2010 20:27:02 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 31487 invoked by uid 99); 12 Mar 2010 20:27:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Mar 2010 20:27:02 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [76.13.13.67] (HELO n4a.bullet.mail.ac4.yahoo.com) (76.13.13.67) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 12 Mar 2010 20:26:54 +0000 Received: from [74.6.228.95] by n4.bullet.mail.ac4.yahoo.com with NNFMP; 12 Mar 2010 20:26:33 -0000 Received: from [76.13.10.174] by t2.bullet.mail.ac4.yahoo.com with NNFMP; 12 Mar 2010 20:26:33 -0000 Received: from [127.0.0.1] by omp115.mail.ac4.yahoo.com with NNFMP; 12 Mar 2010 20:26:33 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 676306.96409.bm@omp115.mail.ac4.yahoo.com Received: (qmail 90570 invoked by uid 60001); 12 Mar 2010 20:26:33 -0000 Message-ID: <551353.89723.qm@web65512.mail.ac4.yahoo.com> X-YMail-OSG: fvOjjscVM1nFcU4ZxY00DSlGgoAgnnqVyHYDTo84roDIQ9W b.mrQlMH8F9uje54_Aakw_JoTEnTwELEA7GzdC2AbO60.O9sBTwsG3pdqcy3 VO3X7h4qay2bLFKRxfs5g778ownd1fIxWSDWYBx_g3P9ho1SX88DPKym._s8 HBkfEIUFSkKP5eakN00SRfxzpjalXfz12FZI_wltEaTdB2EXAKB9HCp4B6PP ZAz31xElZkv3MccH_v7geMFMCAnWgsWzHa8OSNHFBOoKBLqmhVpoOqkxxahn w9vYjCYoakyoHSMHbbohYEq2bjNeqhFVvmKPPnU9ZR_c9_MgkD.lchPOFwIT W6gc- Received: from [69.224.51.68] by web65512.mail.ac4.yahoo.com via HTTP; Fri, 12 Mar 2010 12:26:33 PST X-RocketYMMF: apurtell X-Mailer: YahooMailRC/324.3 YahooMailWebService/0.8.100.260964 References: <10fa01cac16c$f189d050$d49d70f0$@com> <111401cac16e$c4d0ebf0$4e72c3d0$@com> Date: Fri, 12 Mar 2010 12:26:33 -0800 (PST) From: Andrew Purtell Subject: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9) To: hbase-user@hadoop.apache.org In-Reply-To: <111401cac16e$c4d0ebf0$4e72c3d0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org During the Q&A period after my presentation at HUG9, it was interesting that some in the audience indicated they are running production Hadoop and/or HBase clusters on EC2. I want to follow up on some comments I made there. This is a little surprising, because currently the HDFS NameNode is a single point of failure which can bring the whole service down. That the NameNode is a SPOF is not quite so large a concern if you have the ability to engineer the particular server hosting the NameNode to be especially reliable. However, when architecting services on EC2, you must be mindful of its guarantees, or lack thereof. On EC2 the reliability of any given instance is not guaranteed, only the service in the aggregate. Running Hadoop on top of EC2 in production is thus not advised until there is a good hot fail over solution for the NameNode. AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/. Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So should you. Regarding a hot fail over solution for the NameNode, there is some really interesting work ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the architecture. http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html http://issues.apache.org/jira/browse/HDFS-976 http://issues.apache.org/jira/browse/HDFS-234 http://issues.apache.org/jira/secure/attachment/12399656/create.png https://issues.apache.org/jira/browse/ZOOKEEPER-276 Once something like the above is vetted and tested, of course my above advice changes and it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and similar IaaS clouds. In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase EC2 scripts are a useful tool for doing such things with relative ease. Best regards, - Andy ----- Original Message ---- From: Jonathan Gray To: hbase-user@hadoop.apache.org Sent: Thu, March 11, 2010 3:01:22 PM Subject: RE: [databasepro-48] HUG9 Pardon the link vomit, hopefully this comes across okay... HBase Project Update by Jonathan Gray http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do= get&target=HUG9_HBaseUpdate_JonathanGray.pdf HBase and HDFS by Todd Lipcon of Cloudera http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do= get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf HBase on EC2 by Andrew Purtell of Trend Micro http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf