hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jl...@streamy.com>
Subject Re: hbase/zookeeper
Date Fri, 17 Jul 2009 15:50:39 GMT
IMO, you can fit those things into 6.5G without a problem.  Of course, 
the more you give it the better your performance.

However, medium instances have only 2 cores... That's going to be a 
problem.  Under heavy load (especially in an upload/import situation) 
you will starve threads in at least one of these processes... At a 
minimum, you really want a core each for DN, ZK, RS and then your 
requirements for your MR tasks would depend on the nature of them.  If 
they are at all CPU intensive, then you need to be sure to dedicate 
sufficient resources to them.

In general, we recommend XL instances because they are quad core. 
Otherwise you will likely run into issues with this many processes on 
two cores.


Fernando Padilla wrote:
> OK, if you don't mind me stretching this simple conversation a bit more..
> Say I use the medium ec2 instance.. that's about 7.5G of ram, so I have 
> abgout 6.5 total.
> On any one node I would have:
> DataNode
> TaskTracker
> Zookeeper
> RegionServer
> +Map/Reduce Tasks?
> What would your gut be for distributing the memory?
> Can I run my M/R Tasks all sharing one JVM to share the same memory, or 
> does each Map or Reduce have it's own JVM/Memory requirements?
> I'm thinking between 5 to 10 nodes.  I know that this seems stingy for 
> what you guys are used to.. but this is my worst case or minimum 
> allocation.. if need be I can plan to get more nodes and spread around 
> the load (bursting on heavy days, etc).. but I don't want to plan/budget 
> for a large number of nodes until we see good ROI, etc etc etc..
> On 7/14/09 11:54 PM, Nitay wrote:
>> Yes, Ryan's right. While we recommend running ZooKeeper on separate 
>> hosts,
>> it is really only if you can afford to do so. Otherwise, choose some 
>> of your
>> region server machines and run ZooKeeper alongside those.
>> On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<ryanobjc@gmail.com>  wrote:
>>> You can probably host it all on one set of machines.  You'll need the
>>> large sized.
>>> Let us know how EC2 works, performance might be off due to the
>>> virtualization.
>>> On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<fern@alum.mit.edu>
>>> wrote:
>>>> The reason I ask, is that I'm planning on setting up a small HBase
>>> cluster
>>>> in ec2..
>>>> having 3 to 5 instances just for zookeeper, while having only 3 to 5
>>>> instances for Hbase.. it sounds lop-sided. :)
>>>> Does anyone here have any experience with HBase in EC2?
>>>> Ryan Rawson wrote:
>>>>> I run my ZK quorum on my regionservers, but I also have 16 GB ram per
>>>>> regionserver.  I used to run 1gb, and never had problems. Now with
>>>>> hbase managing the quorum I have 5gb ram, and its probalby over kill
>>>>> but better save than sorry.
>>>>> On Tue, Jul 14, 2009 at 6:07 PM, Nitay<nitayj@gmail.com>  wrote:
>>>>>> Hi Fernando,
>>>>>> It is recommended that you run ZooKeeper separate from the Region
>>>>>> Servers.
>>>>>> On the memory side, our use of ZooKeeper in terms of data stored
>>>>>> minimal
>>>>>> currently. However you definitely don't want it to swap and you 
>>>>>> want to
>>>>>> be
>>>>>> able to handle a large number of connections. A safe value would
>>>>>> something like 1GB.
>>>>>> -n
>>>>>> On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla<fern@alum.mit.edu>
>>>>>> wrote:
>>>>>>> So.. what's the recommendation for zookeeper?
>>>>>>> should I run zookeeper nodes on the same region servers?
>>>>>>> should I run zookeeper nodes external to the region servers?
>>>>>>> how much memory should I give zookeeper, if it's just used for

>>>>>>> hbase?

View raw message