Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 2797 invoked from network); 17 Jul 2009 15:21:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Jul 2009 15:21:29 -0000 Received: (qmail 85034 invoked by uid 500); 17 Jul 2009 15:22:34 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 84995 invoked by uid 500); 17 Jul 2009 15:22:34 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 84985 invoked by uid 99); 17 Jul 2009 15:22:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 15:22:33 +0000 X-ASF-Spam-Status: No, hits=-0.3 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of fern@alum.mit.edu does not designate 66.111.4.25 as permitted sender) Received: from [66.111.4.25] (HELO out1.smtp.messagingengine.com) (66.111.4.25) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 15:22:22 +0000 Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 07B7D3BA576 for ; Fri, 17 Jul 2009 11:22:02 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Fri, 17 Jul 2009 11:22:02 -0400 X-Sasl-enc: 4AzspZWco/Ds7ZRLAi2X4mtyjC8u08c02SiLmWaHSCQT 1247844121 Received: from protrade-users-powerbook-g4-15.local (c-69-181-45-53.hsd1.ca.comcast.net [69.181.45.53]) by mail.messagingengine.com (Postfix) with ESMTPSA id 841544B953 for ; Fri, 17 Jul 2009 11:22:01 -0400 (EDT) Message-ID: <4A609717.7090707@alum.mit.edu> Date: Fri, 17 Jul 2009 08:21:59 -0700 From: Fernando Padilla User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2 MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: hbase/zookeeper References: <4A5CFF75.6020503@alum.mit.edu> <82b0992a0907141807k6a933705u40cac3a94f5aad40@mail.gmail.com> <78568af10907142222p467e0a0ds7b38c8bf6a52282d@mail.gmail.com> <4A5D69FB.8070205@alum.mit.edu> <78568af10907142234v377aac75x9fc6c16323acad0b@mail.gmail.com> <82b0992a0907142354p61a84faax5b3a0af1ace002cf@mail.gmail.com> In-Reply-To: <82b0992a0907142354p61a84faax5b3a0af1ace002cf@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org OK, if you don't mind me stretching this simple conversation a bit more.. Say I use the medium ec2 instance.. that's about 7.5G of ram, so I have abgout 6.5 total. On any one node I would have: DataNode TaskTracker Zookeeper RegionServer +Map/Reduce Tasks? What would your gut be for distributing the memory? Can I run my M/R Tasks all sharing one JVM to share the same memory, or does each Map or Reduce have it's own JVM/Memory requirements? I'm thinking between 5 to 10 nodes. I know that this seems stingy for what you guys are used to.. but this is my worst case or minimum allocation.. if need be I can plan to get more nodes and spread around the load (bursting on heavy days, etc).. but I don't want to plan/budget for a large number of nodes until we see good ROI, etc etc etc.. On 7/14/09 11:54 PM, Nitay wrote: > Yes, Ryan's right. While we recommend running ZooKeeper on separate hosts, > it is really only if you can afford to do so. Otherwise, choose some of your > region server machines and run ZooKeeper alongside those. > > On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson wrote: > >> You can probably host it all on one set of machines. You'll need the >> large sized. >> >> Let us know how EC2 works, performance might be off due to the >> virtualization. >> >> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla >> wrote: >>> The reason I ask, is that I'm planning on setting up a small HBase >> cluster >>> in ec2.. >>> >>> having 3 to 5 instances just for zookeeper, while having only 3 to 5 >>> instances for Hbase.. it sounds lop-sided. :) >>> >>> Does anyone here have any experience with HBase in EC2? >>> >>> >>> Ryan Rawson wrote: >>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per >>>> regionserver. I used to run 1gb, and never had problems. Now with >>>> hbase managing the quorum I have 5gb ram, and its probalby over kill >>>> but better save than sorry. >>>> >>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay wrote: >>>>> Hi Fernando, >>>>> >>>>> It is recommended that you run ZooKeeper separate from the Region >>>>> Servers. >>>>> On the memory side, our use of ZooKeeper in terms of data stored is >>>>> minimal >>>>> currently. However you definitely don't want it to swap and you want to >>>>> be >>>>> able to handle a large number of connections. A safe value would be >>>>> something like 1GB. >>>>> >>>>> -n >>>>> >>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla >>>>> wrote: >>>>> >>>>>> So.. what's the recommendation for zookeeper? >>>>>> >>>>>> should I run zookeeper nodes on the same region servers? >>>>>> should I run zookeeper nodes external to the region servers? >>>>>> how much memory should I give zookeeper, if it's just used for hbase? >>>>>> >