Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 fern@alum.mit.edu does not designate 66.111.4.25 as permitted sender)
Message-ID: <4A609717.7090707@alum.mit.edu>
Date: Fri, 17 Jul 2009 08:21:59 -0700
From: Fernando Padilla <fern@alum.mit.edu>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US;
 rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2
MIME-Version: 1.0
To: hbase-user@hadoop.apache.org
Subject: Re: hbase/zookeeper
References: <4A5CFF75.6020503@alum.mit.edu>
	 <82b0992a0907141807k6a933705u40cac3a94f5aad40@mail.gmail.com>
	 <78568af10907142222p467e0a0ds7b38c8bf6a52282d@mail.gmail.com>
	 <4A5D69FB.8070205@alum.mit.edu>
	 <78568af10907142234v377aac75x9fc6c16323acad0b@mail.gmail.com>
 <82b0992a0907142354p61a84faax5b3a0af1ace002cf@mail.gmail.com>
In-Reply-To: <82b0992a0907142354p61a84faax5b3a0af1ace002cf@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

OK, if you don't mind me stretching this simple conversation a bit more..

Say I use the medium ec2 instance.. that's about 7.5G of ram, so I have 
abgout 6.5 total.

On any one node I would have:

DataNode
TaskTracker
Zookeeper
RegionServer
+Map/Reduce Tasks?


What would your gut be for distributing the memory?

Can I run my M/R Tasks all sharing one JVM to share the same memory, or 
does each Map or Reduce have it's own JVM/Memory requirements?


I'm thinking between 5 to 10 nodes.  I know that this seems stingy for 
what you guys are used to.. but this is my worst case or minimum 
allocation.. if need be I can plan to get more nodes and spread around 
the load (bursting on heavy days, etc).. but I don't want to plan/budget 
for a large number of nodes until we see good ROI, etc etc etc..


On 7/14/09 11:54 PM, Nitay wrote:
> Yes, Ryan's right. While we recommend running ZooKeeper on separate hosts,
> it is really only if you can afford to do so. Otherwise, choose some of your
> region server machines and run ZooKeeper alongside those.
>
> On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<ryanobjc@gmail.com>  wrote:
>
>> You can probably host it all on one set of machines.  You'll need the
>> large sized.
>>
>> Let us know how EC2 works, performance might be off due to the
>> virtualization.
>>
>> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fern@alum.mit.edu>
>> wrote:
>>> The reason I ask, is that I'm planning on setting up a small HBase
>> cluster
>>> in ec2..
>>>
>>> having 3 to 5 instances just for zookeeper, while having only 3 to 5
>>> instances for Hbase.. it sounds lop-sided. :)
>>>
>>> Does anyone here have any experience with HBase in EC2?
>>>
>>>
>>> Ryan Rawson wrote:
>>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
>>>> regionserver.  I used to run 1gb, and never had problems. Now with
>>>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>>>> but better save than sorry.
>>>>
>>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<nitayj@gmail.com>  wrote:
>>>>> Hi Fernando,
>>>>>
>>>>> It is recommended that you run ZooKeeper separate from the Region
>>>>> Servers.
>>>>> On the memory side, our use of ZooKeeper in terms of data stored is
>>>>> minimal
>>>>> currently. However you definitely don't want it to swap and you want to
>>>>> be
>>>>> able to handle a large number of connections. A safe value would be
>>>>> something like 1GB.
>>>>>
>>>>> -n
>>>>>
>>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla<fern@alum.mit.edu>
>>>>> wrote:
>>>>>
>>>>>> So.. what's the recommendation for zookeeper?
>>>>>>
>>>>>> should I run zookeeper nodes on the same region servers?
>>>>>> should I run zookeeper nodes external to the region servers?
>>>>>> how much memory should I give zookeeper, if it's just used for hbase?
>>>>>>
>