hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulia Zidaru <iulia.zid...@1and1.ro>
Subject Re: Hardware configuration
Date Tue, 03 May 2011 09:03:09 GMT
  Thank you all. It really helps to see the points that guide you when 
choosing the hardware.

On 05/02/2011 08:45 PM, Ted Dunning wrote:
> For map-reduce, the balancing is easier because you can configure slots.  It
> would be nice to be
> able to express cores and memory separately, but slots are pretty good.
> For HDFS, the situation is much worse because the balancing is based on
> percent fill.  That leaves
> you with much less available space on smaller machines.  You also wind up
> with odd segregation by
> age between different kinds of data.  That leads to poor I/O performance.
> On Mon, May 2, 2011 at 10:31 AM, Jean-Daniel Cryans<jdcryans@apache.org>wrote:
>> I think the first issues you would encounter (just regarding HBase,
>> not talking about MR) is that if you have wildly different HW some
>> nodes might be able to handle their share of the load but some others
>> might not. At the moment the master doesn't know about the HW on the
>> slave nodes so it will just balance the regions equally. You would put
>> yourself in a situation where you would need to disable the balancer
>> and then do its job by yourself.
>> Problems like that.
>> J-D
>> On Mon, May 2, 2011 at 10:03 AM, Chris Tarnas<cft@email.com>  wrote:
>>> What are some of the common pitfalls of having different configurations
>> for different nodes? Is the problem more management issues, making sure each
>> type of node has its own config (so a 12 core box has 12 mappers and
>> reduces, an 8 core has 8, drive layouts, etc) or are there problems that
>> configuration changes can't deal with?
>>> thanks,
>>> -chris
>>> On May 2, 2011, at 6:57 AM, Michael Segel wrote:
>>>> Hi,
>>>> That's actually a really good question.
>>>> Unfortunately, the answer isn't really simple.
>>>> You're going to need to estimate your growth and you're going to need to
>> estimate your configuration.
>>>> Suppose I know that within 2 years, the amount of data that I want to
>> retain is going to be 1PB, with a 3x replication factor, I'll need at least
>> 3PB of disk. Assuming that I can fit 12x2TB drives in a node, I'll need
>> 125-150 machines. (There's some overhead for logging and OS)
>>>> Now this doesn't mean that I'll need to buy all of the machines today
>> and build out the cluster.
>>>> It means that I will need to figure out my machine room, (rack space,
>> power, etc...) and also hardware configuration.
>>>> You'll also need to plan out your hardware choices too. An example.. you
>> may want 10GBe on the switch but not at the data node. However you're going
>> to want to be able to expand your data nodes to be able to add 10GBe cards.
>>>> The idea is that as I build out my cluster, all of the machines have the
>> same look and feel. So if you buy quad core CPUs and they are 2.2 GHz but 6
>> months from now, you buy 2.6 GHz cpus, as long as they are 4 core cpus, your
>> cluster will look the same.
>>>> The point is that when you lay out your cluster to start with, you'll
>> need to plan ahead and keep things similar. Also you'll need to make sure
>> your NameNode has enough memory...
>>>> Having said that... Yahoo! has written a paper detailing MR2 (next
>> generation of map/reduce).  As the M/R Job scheduler becomes more
>> intelligent about the types of jobs and types of hardware, the consistency
>> of hardware becomes less important.
>>>> With respect to HBase, I suspect there to be a parallel evolution.
>>>> As to building out and replacing your cluster... if this is a production
>> environment, you'll have to think about DR and building out a second
>> cluster. So the cost of replacing clusters should also be factored in when
>> you budget for hardware.
>>>> Like I said, its not a simple answer and you have to approach each
>> instance separately and fine tune your cluster plans.
>>>> HTH
>>>> -Mike
>>>> ----------------------------------------
>>>>> Date: Mon, 2 May 2011 09:53:05 +0300
>>>>> From: iulia.zidaru@1and1.ro
>>>>> To: user@hbase.apache.org
>>>>> CC: stack@duboce.net
>>>>> Subject: Re: Hardware configuration
>>>>> Thank you both. How would you estimate really big clusters, with
>>>>> hundreds of nodes? Requirements might change in time and replacing an
>>>>> entire cluster seems not the best solution...
>>>>> On 04/29/2011 07:08 PM, Stack wrote:
>>>>>> I agree with Michel Segel. Distributed computing is hard enough.
>>>>>> There is no need to add extra complexity.
>>>>>> St.Ack
>>>>>> On Fri, Apr 29, 2011 at 4:05 AM, Iulia Zidaru wrote:
>>>>>>> Hi,
>>>>>>> I'm wondering if having a cluster with different machines in
terms of
>> CPU,
>>>>>>> RAM and disk space would be a big issue for HBase. For example,
>> machines
>>>>>>> with 12GBs RAM and machines with 48GBs. We suppose that we use
>> at full
>>>>>>> capacity. What problems we might encounter if having this kind
>>>>>>> configuration?
>>>>>>> Thank you,
>>>>>>> Iulia
>>>>> --
>>>>> Iulia Zidaru
>>>>> Java Developer
>>>>> 1&1 Internet AG - Bucharest/Romania - Web Components Romania
>>>>> 18 Mircea Eliade St
>>>>> Sect 1, Bucharest
>>>>> RO Bucharest, 012015
>>>>> iulia.zidaru@1and1.ro
>>>>> 0040 31 223 9153

View raw message