hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Scan performance on a big table as combination of multiple logic tables
Date Thu, 16 Feb 2012 01:43:27 GMT
> Too many regions kill HBase.

How many regions do you carry per RS? What was the effective limit you encountered? Curious.

The available public information is getting old now but BigTable deployments at Google limited
the number of tablets per tablet server to ~100. This was for a number of reasons related
to their specific hardware configuration, no doubt, considerations such as having enough RAM
to keep in memory tables in memory, and the fact they had something like 160 or 320 GB of
local storage only, and so on; but also presumably to limit the scope of failure of a given
server, and to keep overheads down.

I advise our ops people to set notifications for when the number of regions per HBase RegionServer
gets above 500. The more regions per server, the more must be relocated per server failure,
the longer some regions will be in transition. When we get close to the limit, it's time to
add another RegionServer. (Even if HBase could handle 10,000 regions per RegionServer that
wouldn't be a good idea without a distributed master of some kind.) If you are scaling out
for this reason already, then the region carrying capacity of the cluster is also scaling.
We have many thousands of regions and region housekeeping overhead is not an issue, although
we are certainly not the largest deployment. Currently the META region isn't split, I think
that might impose an effective upper bound at some point, but that can be fixed. There's no
architectural limit that I am aware of.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

----- Original Message -----
> From: Vladimir Rodionov <vrodionov@carrieriq.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> Cc: 
> Sent: Wednesday, February 15, 2012 4:11 PM
> Subject: RE: Scan performance on a big table as combination of multiple logic tables
> 10 tables are fine. 1000 are not, especially when one does table pre-splitting 
> to increase write perf.
> Too many regions kill HBase.
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> ________________________________________
> From: Jacques [whshub@gmail.com]
> Sent: Wednesday, February 15, 2012 3:45 PM
> To: dev@hbase.apache.org
> Subject: Re: Scan performance on a big table as combination of multiple logic 
> tables
> Out of curiosity,  what do you perceive as the benefit to having only one
> table?  Are there reasons that you think one table would perform better
> than a few?
> If you're splitting data within a table because you'd otherwise have
> millions of tables, I understand that and would concur with Vladimir's
> approach below.  However, if you're really looking at 10 tables versus one
> table, it seems like HBase is built exactly to make that work well (rather
> than having to make all sorts of application level code to do what HBase
> already does).
> thanks,
> Jacques
> On Wed, Feb 15, 2012 at 1:57 PM, Pan, Thomas <thpan@ebay.com> wrote:
>>  Since Hbase is tailored to handle one table very well, we are thinking to
>>  put multiple tables into one big table but on different column family sets.
>>  Our use case is full table scan against single column value filters. As
>>  records from different "logical tables" are at different column 
> families,
>>  could we speed up the scan performance by simply checking the column family
>>  referenced by these single column value filters first before really going
>>  through all the underlying K-V pairs? It would be great if the Hbase code
>>  is already coded that way.
>>  $0.02,
>>  Thomas
> Confidentiality Notice:  The information contained in this message, including 
> any attachments hereto, may be confidential and is intended to be read only by 
> the individual or entity to whom this message is addressed. If the reader of 
> this message is not the intended recipient or an agent or designee of the 
> intended recipient, please note that any review, use, disclosure or distribution 
> of this message or its attachments, in any form, is strictly prohibited.  If you 
> have received this message in error, please immediately notify the sender and/or 
> Notifications@carrieriq.com and delete or destroy any copy of this message and 
> its attachments.

View raw message