hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pan, Thomas" <th...@ebay.com>
Subject Re: Scan performance on a big table as combination of multiple logic tables
Date Fri, 17 Feb 2012 18:55:54 GMT

Vladimire and Jacques, Thanks for the information! Unless Hbase well
handles multiple big sized tables (relatively high region count) in one
cluster, it seems to me that one big table is the way to go. Otherwise,
runtime tuning seems to add quite amount of operational cost. That leads
to another question. Do we see big region size as an issue? If so, what's
the pivot point as region size grows further, the scan performance starts
to degrade exponentially?

On 2/15/12 4:11 PM, "Vladimir Rodionov" <vrodionov@carrieriq.com> wrote:

>10 tables are fine. 1000 are not, especially when one does table
>pre-splitting to increase write perf.
>
>Too many regions kill HBase.
>
>Best regards,
>Vladimir Rodionov
>Principal Platform Engineer
>Carrier IQ, www.carrieriq.com
>e-mail: vrodionov@carrieriq.com
>
>________________________________________
>From: Jacques [whshub@gmail.com]
>Sent: Wednesday, February 15, 2012 3:45 PM
>To: dev@hbase.apache.org
>Subject: Re: Scan performance on a big table as combination of multiple
>logic tables
>
>Out of curiosity,  what do you perceive as the benefit to having only one
>table?  Are there reasons that you think one table would perform better
>than a few?
>
>If you're splitting data within a table because you'd otherwise have
>millions of tables, I understand that and would concur with Vladimir's
>approach below.  However, if you're really looking at 10 tables versus one
>table, it seems like HBase is built exactly to make that work well (rather
>than having to make all sorts of application level code to do what HBase
>already does).
>
>thanks,
>Jacques
>
>On Wed, Feb 15, 2012 at 1:57 PM, Pan, Thomas <thpan@ebay.com> wrote:
>
>>
>> Since Hbase is tailored to handle one table very well, we are thinking
>>to
>> put multiple tables into one big table but on different column family
>>sets.
>> Our use case is full table scan against single column value filters. As
>> records from different "logical tables" are at different column
>>families,
>> could we speed up the scan performance by simply checking the column
>>family
>> referenced by these single column value filters first before really
>>going
>> through all the underlying K-V pairs? It would be great if the Hbase
>>code
>> is already coded that way.
>>
>>
>> $0.02,
>> Thomas
>>
>>
>
>Confidentiality Notice:  The information contained in this message,
>including any attachments hereto, may be confidential and is intended to
>be read only by the individual or entity to whom this message is
>addressed. If the reader of this message is not the intended recipient or
>an agent or designee of the intended recipient, please note that any
>review, use, disclosure or distribution of this message or its
>attachments, in any form, is strictly prohibited.  If you have received
>this message in error, please immediately notify the sender and/or
>Notifications@carrieriq.com and delete or destroy any copy of this
>message and its attachments.


Mime
View raw message