hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Regarding Hardware configuration for HBase cluster
Date Sat, 08 Feb 2014 05:48:53 GMT
Let's not refer to our users in the third person. It's not polite :)

Suresh,

I wrote something up about RegionServer sizing here: http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html

For your load I would guess that you'd need about 100 servers.

That would:
1. have 8TB/server
2. 30m rows/day/server
3. 30GB/day/server

You not expect a single server to be able to absorb more than 10000rows/s or 40mb/s, whatever
is less.

The machines I'd size as follows:
12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
32-96GB ram
6-12 drives (more spindles are better to absorb the write load)
10ge NICs and TopOfRack switches

Now, this is only a *rough guideline* and obviously you'd have perform your own tests and
this would only scale across if the machines if your keys are sufficiently distributed.
The details also depend on how compressable your data is and your exact access patterns (read
patters, spiky write load, etc)
Start with 10 data nodes and appropriately scaled down load and see how it works.

Vladimir is right here, you probably want to seek professional help.

-- Lars




________________________________
 From: Vladimir Rodionov <vrodionov@carrieriq.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Friday, February 7, 2014 10:29 AM
Subject: RE: Regarding Hardware configuration for HBase cluster
 

This guy is building system of a scale of Yahoo and asking user group how to size the cluster.
Few people here can give him advice based on their experience and I am not one of them. I
can
only speculate on "how many nodes will we need to consume 3TB/3B records daily".

For this scale of a system its better to go to Cloudera/IBM/HW, and not to try to build it
yourself,
especially when you ask questions on user group (not answer them).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________

From: Ted Yu [yuzhihong@gmail.com]
Sent: Friday, February 07, 2014 6:27 AM
To: user@hbase.apache.org
Cc: user@hbase.apache.org
Subject: Re: Regarding Hardware configuration for HBase cluster

Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?

Cheers

On Feb 6, 2014, at 8:47 PM, suresh babu <bigdatacslt@gmail.com> wrote:

> Hi Stana,
>
> We are trying to find out how many data nodes (including hardware
> configuration detail)should be configured or setup for this requirement
>
> -suresh
>
> On Friday, February 7, 2014, stana <stana@is-land.com.tw> wrote:
>
>> HI suresh babu :
>>
>> how many data nodes do you have?
>>
>>
>> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
>>
>>> refreshing the thread,
>>>
>>> Can you please  suggest any inputs for the hardware configuration(for the
>>> below mentioned use case).
>>>
>>>
>>>
>>>
>>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bigdatacslt@gmail.com>
>>> wrote:
>>>
>>>> Please find the data requirements for our use case below :
>>>>
>>>> Raw data processing
>>>> ----------------------------------
>>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
>> day
>>>> in to hbase
>>>>
>>>> 2. Oldest data after X days to be deleted from hbase
>>>>
>>>> Aggregates processing
>>>> ----------------------------------
>>>> 3 billion reads per day ... Large scan or reads
>>>>
>>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
>>>> Hive queries in future, but not of immediate focus
>>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vrodionov@carrieriq.com>
>>>> wrote:
>>>>
>>>>> Yes,
>>>>>
>>>>> 1. What is the expected avg and peak load in
>>> writes/updates/deletes/reads?
>>>>> 2. What is the average size of a KV?
>>>>> 3. Reads/small scans/medium/large scan %%
>>>>> 4. Do you plan M/R jobs, Hive query?
>>>>>
>>>>>
>>>>> Best regards,
>>>>> Vladimir Rodionov
>>>>> Principal Platform Engineer
>>>>> Carrier IQ, www.carrieriq.com
>>>>> e-mail: vrodionov@carrieriq.com
>>>>>
>>>>> ________________________________________
>>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
>>>>> Sent: Tuesday, February 04, 2014 10:02 AM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
>>>>>
>>>>> I guess you'd better describe a little bit more about your
>> applications.
>>>>> Does the data increase over the time at all?
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bigdatacslt@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> We are trying to setup HBase cluster for the following requirement:
>>>>>>
>>>>>> We have to maintain data of size around 800TB,
>>>>>>
>>>>>> For the above requirement,please suggest me the best hardware
>>>>> configuration
>>>>>> details like
>>>>>>
>>>>>> 1)how many disks to consider for machine and the  capacity of disks
>>> ,for
>>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
>>>>>>
>>>>>> 2) which compression method is suited for production environment
,
>>>>> space is
>>>>>> not a major limitation , but speed is of prime concern for my use
>> case
>>>>>>
>>>>>> 3) how many CPU Cores should be configured for each node/machine
?
>> Or
>>>>>> ideal ratio of number of cores to the number of disks,for example
>>>>>> 1core/1disk ?
>>>>>>
>>>>>> Regards,
>>>>>> Kaushik
>>>>>
>>>>> Confidentiality Notice:  The information contained in this message,
>>>>> including any attachments hereto, may be confidential and is intended
>>> to be
>>>>> read only by the individual or entity to whom this message is
>>> addressed. If
>>>>> the reader of this message is not the intended recipient or an agent
>> or
>>>>> designee of the intended recipient, please note that any review, use,
>>>>> disclosure or distribution of this message or its attachments, in any
>>> form,
>>>>> is strictly prohibited.  If you have received this message in error,
>>> please
>>>>> immediat--
>> Best Regards
>>
>> 亦思科技  is-land Systems Inc.
>> Tel:03-5630345 Ext.14
>> Fax:03-5631345
>> e-MAIL:stana@is-land.com.tw <javascript:;>
>>
>> 何永安 Yung An He
>>

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message