hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: Row get very slow
Date Thu, 10 Nov 2011 18:44:24 GMT
"BLOCKSIZE => '536870912'"


You set your blocksize to 512mb? The default is 64k (65536), try to set it to something lower.

-- Lars
________________________________

From: Damien Hardy <dhardy@figarocms.fr>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Sent: Thursday, November 10, 2011 3:11 AM
Subject: Row get very slow

Hello there.


When I want to get a row by rowid the answer is very slow (even 15 secs some times)
What is wrong with my Htable ?
Here is some examples to illustrate my problem:

hbase(main):030:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==',
{ COLUMN => 'body:body', VERSIONS => 1 }
COLUMN                                               CELL
body:body                                           timestamp=1320919979701,
value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...

1 row(s) in 6.0310 seconds

hbase(main):031:0> scan 'logs', { STARTROW =>'_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==',
LIMIT => 1 }
ROW                                                  COLUMN+CELL
_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKq column=body:body, timestamp=1320919979701,
value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
rSSqNcToHdA==

1 row(s) in 2.7160 seconds

hbase(main):032:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA=='
COLUMN                                               CELL
body:body                                           timestamp=1320919979701,
value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...
1 row(s) in 5.0640 seconds

hbase(main):033:0> describe 'logs'
DESCRIPTION                                                     
                                                           
         ENABLED
{NAME => 'logs', FAMILIES => [{NAME => 'body', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
=> '0', COMPRESSION => 'SNAPPY', VERSIONS => true
  '1', TTL => '2147483647', BLOCKSIZE => '536870912', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}]}
1 row(s) in 0.0660 seconds

hbase(main):025:0> get 'logs', '_f:squid_t:20111110111259_b:squid_s:304-ts3aa/WDoKqrSSqNcToHdA==',
{ COLUMN => 'body:body', TIMERANGE => [1320919900000,1320920000000] }
COLUMN                                               CELL
body:body                                           timestamp=1320919979701,
value=Nov 10 11:05:41 haproxy[15469]: ... [haproxy logs] ...

1 row(s) in 0.0630 seconds


scan is always fatser than get, I think it's strange.

I get normal answer when I precise the TS.

The table is about 200 regions distributed on 2 nodes (with full stack on each : hdfs / hbase
master+regionserver / zookeeper)
Region size is 2GB now.

Recently I increase region size from default size (128MB if I remember) to 2Go to get fewer
number of regions (I had 3500 regions).

I change hbase.hregion.max.filesize to 2147483648, restart my whole cluster, create a new
table, copy via pig from old table to the new one => fewer regions => I'm happy \o/
But on my older table the get answer was very fast, like the one with TS precised on the new
table.

Is the size of regions affect so much the Hbase answer fastness ?

get on other table not rebuilt after config change (regions not merged) is still fast.

Thank you,

-- Damien

Mime
View raw message