hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: MATA load balance
Date Thu, 04 Feb 2010 02:06:07 GMT
Region information is cached in a structure called "TableServers"
which is shared among multiple HTable instances. In fact it is best
practice to use 1 HTable per thread, otherwise you may see performance
issues.  The META table is not scanned on startup, but on the RPC get,
which causes a RPC to the META table then to the individual
regionserver that hosts the data.

I'd have to recommend against splitting META.  I have seen situations
where that actually increases the META load.

You might want to investigate the "hbase.client.scanner.caching"
setting - the shipped default is fairly low, and you dramatically
improve performance by upping the batch amount to like 1000-3000 rows
at a time.

2010/2/3  <y_823910@tsmc.com>:
> Hi,
> Our Biz logic is to scan 31 hbase tables sequentially then
> join these data in the local machine.
> We dispatch these logic with different parameter to our computing cluster.
> So a lot of connections will connect to HBase almost concurrently,
> It took a thundred seconds to get a few rows. I wonder the overhead is by
> the following code.
> HTable table = new HTable(config, Bytes.toBytes(TableName));
>
> It will scan META table while we new a HTable, right?
> (META region server got a lot of requests I saw on the web console)
> Splitting META into two region, does it work?
> Because all my clients query it with the same table name at the same time,
> it seems no chance to load balance it by Splitting META ! Is it true?
> If I can query META once than broastcasting it through my way to my
> clients,
> then they can direct access to region server in order to avoid META
> bottleneck.
> Is that possible? Any suggestions?
> Thanks
>
> Fleming Chiu(邱宏明)
> 707-6128
> y_823910@tsmc.com
> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>
>
>
>
>
>                      Ryan Rawson
>                      <ryanobjc@gmail.c        To:      hbase-user@hadoop.apache.org
>                      om>                      cc:      (bcc: Y_823910/TSMC)
>                                               Subject: Re: MATA load balance
>                      2010/02/03 04:32
>                      PM
>                      Please respond to
>                      hbase-user
>
>
>
>
>
>
> A little birdy told me that META performance can potentially degrade
> with a high # of store files, so try to major_compact '.META.' first.
>
> Secondly, yes META can be a bottleneck, but it should serve out of ram
> nearly constantly. Combined with longer lived clients, this should
> mitigate things somewhat.
>
> One option is to use a long lived gateway process, eg: thrift, which
> will amortize the cost of the META lookup over many small client
> connections.  This is what I do with PHP, and it works well.
>
> -ryan
>
> 2010/2/3  <y_823910@tsmc.com>:
>> Hi,
>> Our cluster with 3 zookeepers, 10 region servers, 19 data nodes.
>> Each machine has 4 core cpu, 12G ram.
>> There are 1322 regions in our cluster now.
>> We fired up to 3000 hbase client in parallel to fetch hbase data for
>> distributed computing.
>> Despite of Htable just one time visit to MATA table; there is only one
>> server with the MATA information,
>> it seems a bottleneck while I fired so many clients at the same time.
>> Any suggestions?
>>
>> Fleming Chiu(邱宏明)
>> 707-6128
>> y_823910@tsmc.com
>> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>>
>>
>>
> ---------------------------------------------------------------------------
>>                                                         TSMC PROPERTY
>>  This email communication (and any attachments) is proprietary
> information
>>  for the sole use of its
>>  intended recipient. Any unauthorized review, use or distribution by
> anyone
>>  other than the intended
>>  recipient is strictly prohibited.  If you are not the intended
> recipient,
>>  please notify the sender by
>>  replying to this email, and then delete this email and any copies of it
>>  immediately. Thank you.
>>
> ---------------------------------------------------------------------------
>>
>>
>>
>>
>
>
>
>
>  ---------------------------------------------------------------------------
>                                                         TSMC PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>  ---------------------------------------------------------------------------
>
>
>
>

Mime
View raw message