hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Regarding Hardware configuration for HBase cluster
Date Sun, 09 Feb 2014 05:35:29 GMT
In a year or two you won't be able to buy 1T or even 2T disks cheaply.
More spindles are good more cores are good too. This is a fuzzy art.

A hard fact is that HBase cannot (at the moment) handle more than 8-10T per server with HBase,
you'd  just have extra disks for IOPS.
You won't be happy if you expect each server to store 24T.

I would go with more and smaller servers. Some people run two RegionServers on a single machine,
but that is not a well explored option at this point (up to recently it needed an HBase patch
to work).

You *definitely* have to do some benchmarking with your usecase. You might be able to get
away with fewer servers, you need to test for that.

-- Lars




________________________________
 From: Ramu M S <ramu.malur@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, February 8, 2014 12:10 AM
Subject: Re: Regarding Hardware configuration for HBase cluster
 

Lars,

What about high density storage servers that has capacity of up to 24
drives. There were also some recommendations in few blogs about having 1
core per disk.

1TB disks have slight price difference compared to 600 GB. With
negotiations it'll be as low as 50$. Also price difference between 8 core
and 12 core processors is very less, 200-300$.

Do you think having 20-24 cores and 24 1TB disks will also be an option?

Regards,
Ramu

On Feb 8, 2014 11:19 AM, "lars hofhansl" <larsh@apache.org> wrote:

> Let's not refer to our users in the third person. It's not polite :)
>
> Suresh,
>
> I wrote something up about RegionServer sizing here:
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
>
> For your load I would guess that you'd need about 100 servers.
>
> That would:
> 1. have 8TB/server
> 2. 30m rows/day/server
> 3. 30GB/day/server
>
> You not expect a single server to be able to absorb more than 10000rows/s
> or 40mb/s, whatever is less.
>
> The machines I'd size as follows:
> 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> 32-96GB ram
> 6-12 drives (more spindles are better to absorb the write load)
> 10ge NICs and TopOfRack switches
>
> Now, this is only a *rough guideline* and obviously you'd have perform
> your own tests and this would only scale across if the machines if your
> keys are sufficiently distributed.
> The details also depend on how compressable your data is and your exact
> access patterns (read patters, spiky write load, etc)
> Start with 10 data nodes and appropriately scaled down load and see how it
> works.
>
> Vladimir is right here, you probably want to seek professional help.
>
> -- Lars
>
>
>
>
> ________________________________
>  From: Vladimir Rodionov <vrodionov@carrieriq.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Friday, February 7, 2014 10:29 AM
> Subject: RE: Regarding Hardware configuration for HBase cluster
>
>
> This guy is building system of a scale of Yahoo and asking user group how
> to size the cluster.
> Few people here can give him advice based on their experience and I am not
> one of them. I can
> only speculate on "how many nodes will we need to consume 3TB/3B records
> daily".
>
> For this scale of a system its better to go to Cloudera/IBM/HW, and not to
> try to build it yourself,
> especially when you ask questions on user group (not answer them).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
>
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Friday, February 07, 2014 6:27 AM
> To: user@hbase.apache.org
> Cc: user@hbase.apache.org
> Subject: Re: Regarding Hardware configuration for HBase cluster
>
> Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes ?
>
> Cheers
>
> On Feb 6, 2014, at 8:47 PM, suresh babu <bigdatacslt@gmail.com> wrote:
>
> > Hi Stana,
> >
> > We are trying to find out how many data nodes (including hardware
> > configuration detail)should be configured or setup for this requirement
> >
> > -suresh
> >
> > On Friday, February 7, 2014, stana <stana@is-land.com.tw> wrote:
> >
> >> HI suresh babu :
> >>
> >> how many data nodes do you have?
> >>
> >>
> >> 2014-02-07 suresh babu <bigdatacslt@gmail.com <javascript:;>>:
> >>
> >>> refreshing the thread,
> >>>
> >>> Can you please  suggest any inputs for the hardware configuration(for
> the
> >>> below mentioned use case).
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <bigdatacslt@gmail.com>
> >>> wrote:
> >>>
> >>>> Please find the data requirements for our use case below :
> >>>>
> >>>> Raw data processing
> >>>> ----------------------------------
> >>>> 1. Data is populated into hdfs , after etl around 3 billion puts per
> >> day
> >>>> in to hbase
> >>>>
> >>>> 2. Oldest data after X days to be deleted from hbase
> >>>>
> >>>> Aggregates processing
> >>>> ----------------------------------
> >>>> 3 billion reads per day ... Large scan or reads
> >>>>
> >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R jobs
> >>>> Hive queries in future, but not of immediate focus
> >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <vrodionov@carrieriq.com
> >
> >>>> wrote:
> >>>>
> >>>>> Yes,
> >>>>>
> >>>>> 1. What is the expected avg and peak load in
> >>> writes/updates/deletes/reads?
> >>>>> 2. What is the average size of a KV?
> >>>>> 3. Reads/small scans/medium/large scan %%
> >>>>> 4. Do you plan M/R jobs, Hive query?
> >>>>>
> >>>>>
> >>>>> Best regards,
> >>>>> Vladimir Rodionov
> >>>>> Principal Platform Engineer
> >>>>> Carrier IQ, www.carrieriq.com
> >>>>> e-mail: vrodionov@carrieriq.com
> >>>>>
> >>>>> ________________________________________
> >>>>> From: Nick Xie [nick.xie.hadoop@gmail.com]
> >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> >>>>> To: user@hbase.apache.org
> >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> >>>>>
> >>>>> I guess you'd better describe a little bit more about your
> >> applications.
> >>>>> Does the data increase over the time at all?
> >>>>>
> >>>>> Nick
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <bigdatacslt@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi folks,
> >>>>>>
> >>>>>> We are trying to setup HBase cluster for the following requirement:
> >>>>>>
> >>>>>> We have to maintain data of size around 800TB,
> >>>>>>
> >>>>>> For the above requirement,please suggest me the best hardware
> >>>>> configuration
> >>>>>> details like
> >>>>>>
> >>>>>> 1)how many disks to consider for machine and the  capacity
of disks
> >>> ,for
> >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> >>>>>>
> >>>>>> 2) which compression method is suited for production environment
,
> >>>>> space is
> >>>>>> not a major limitation , but speed is of prime concern for my
use
> >> case
> >>>>>>
> >>>>>> 3) how many CPU Cores should be configured for each node/machine
?
> >> Or
> >>>>>> ideal ratio of number of cores to the number of disks,for example
> >>>>>> 1core/1disk ?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Kaushik
> >>>>>
> >>>>> Confidentiality Notice:  The information contained in this message,
> >>>>> including any attachments hereto, may be confidential and is intended
> >>> to be
> >>>>> read only by the individual or entity to whom this message is
> >>> addressed. If
> >>>>> the reader of this message is not the intended recipient or an agent
> >> or
> >>>>> designee of the intended recipient, please note that any review,
use,
> >>>>> disclosure or distribution of this message or its attachments, in
any
> >>> form,
> >>>>> is strictly prohibited.  If you have received this message in error,
> >>> please
> >>>>> immediat--
> >> Best Regards
> >>
> >> 亦思科技  is-land Systems Inc.
> >> Tel:03-5630345 Ext.14
> >> Fax:03-5631345
> >> e-MAIL:stana@is-land.com.tw <javascript:;>
> >>
> >> 何永安 Yung An He
> >>
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message