hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Hbase fails at moderate load.
Date Mon, 01 Feb 2010 18:11:14 GMT
>From what you pasted:

2010-02-01 14:05:49,445 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split,
META updated, and report to master all successful. Old region=REGION
=> {NAME => 'oldWebSingleRowCacheStore,,1265029544146', STARTKEY =>
'', ENDKEY => 'filmMenuEditions-not_selected\xC2\xAC1405', ENCODED =>
1899385768, OFFLINE => true, SPLIT => true, TABLE => {{NAME =>
'oldWebSingleRowCacheStore', MAX_FILESIZE => '64', FAMILIES => [{NAME
=> 'content', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}, {NAME => 'description', COMPRESSION => 'NONE', VERSIONS =>
'3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]}}, new regions:
oldWebSingleRowCacheStore,,1265029549167,
oldWebSingleRowCacheStore,filmLastTopics\xC2\xAC1155,1265029549167.
Split took 0sec

I see MAX_FILESIZE => '64' which means you have set that table to
split after 64 _bytes_ so either use the default value of 256MB
(256*1024*1024) or even higher if you wish (I set usually set them to
1GB).

J-D

2010/2/1 Michał Podsiadłowski <podsiadlowski@gmail.com>:
> Hi Stack,
> thanks for your last input.
>
> I've started new week with few tweaks of environment. I've put down one of
> the web servers so i gained additional node.
> I've put there HMaster, both namenodes and zookeeper and requested from our
> IT stuff some additional memory to rest of nodes.
>
> Now setup is like this:
> Namenode + Secondary Namenode + HMaster @ 1GB + zookeeper @256MB - machine
> with 4gb
> 3 x datanodes/hregions   - DataNode @768Mb  + HRegion @1GB  - machines  2GB
> of ram
> 2 additional zookeepers @256MB on webservers that are uploading to hbase.
>
> Probably more memory for OS cache/buffors on datanodes would be useful but
> free -m after quite long upload says:
> *             total       used       free     shared    buffers    
cached
> Mem:          2048        903       1144          0         37  
     362
> -/+ buffers/cache:        503       1544
> Swap:         1019          0       1019
>
> *All is based on hadoop 0.20.2 and hbase 0.20.3.
>
>
> All seems to be much more stable.
> Too many open files is no longer a problem (max file size - 16mb was wrong
> idea).
> But still problem with dividing very first region occured.
> For around 1 minute regions were dividing and dividing till they reach total
> count around 130.
> During that time in .META. some regions were not assigned to servers  ( exp.
> no address for region in .META.).
> But I think i haven't seen problems with hitting wrong regions or not
> serving regions.
> This is something that really freaks us out, because potential this can
> happen every region split
> and then whole application can go bananas.
> Can someone explain why regions are dividing so rapidly and to such a
> quantity?
>
> http://pastebin.com/m73276a36 - here you can find a piece of log from that
> moment
>
>
> Cheers,
> Michal
>
> *
>
> *
>
> 2010/1/31 Stack <stack@duboce.net>
>
>> What Tim said and then some comments in the below.
>>
>> What version of hbase?
>>
>>
>> >
>> > This happens every time when first region starts to split. As far as i
>> can
>> > see table is set to enabled *false* (web admin), web admin becomes little
>> > bit less responsible - listing table regions shows no regions.
>> > and after a while i can see 500 or more regions.
>>
>> You go from zero to 500 regions with nothing showing in between?
>> Thats pretty impressive.  500 regions in 256M on 3 servers is probably
>> pushing it
>>
>>  Some of them as exception
>> > shows are not fully available.
>>
>> Identify the duff regions by running a full table scan in the shell
>> with DEBUG enabled on the client.  It'll puke when it hits the first
>> broke region
>>
>> HDFS doesn't seems to be the main issue. When
>> > i run fsck it says hbase dir is healthy apart from some under replicated
>> > blocks. Occasionaly i saw that some blocks where missing but i think this
>> > was due to "Too many files open" exceptions (to small regions size - now
>> > it's default 64)
>>
>> Too many open files is bad.  Check out the hbase 'Getting Started'.
>>
>>
>> > Amount of data is not enormous - around 1gb in less then 100k rows then
>> this
>> > problems starts to occur. Request per seconds is i think small - 20-30
>> per
>> > second.
>> > What else i can say is I've set the max hbase retry to only 2 because we
>> > can't allow clients to wait more for response.
>> >
>>
>> I would suggest you leave things at default till running smooth then
>> start in optimizing.
>>
>>
>> > What i would like to know is whether the table is always disabled when
>> > performing region splits?
>>
>> No.  Region goes offline for some period of time.  If machines are
>> heavily loaded it will take longer for it to come back on line again.
>>
>> And is it truly disabled then so that clients
>> > can't do anything?
>> > It looks like status says disabled but still requests are processed,
>> though,
>> > with different results (some like above).
>> >
>>
>> Disabled or 'offline'?   Parents of region splits go offline and are
>> replaced by new daughter splits.
>>
>> >
>> >
>> > My cluster setup can be probably useful -
>> > 3 centos virtual machines based on xen running DN/HR and zookeeper  + one
>> of
>> > them NodeMaster and Secondary Master.
>> > 2 gigs of ram on each. Currently hadoop processes run with Xmx 512 and
>> hbase
>> > with 256 but non of them is swapping nor going out of memory.
>> > GC logs looks normal - stop the world is not occurring ;)
>>
>>
>> Really?  No full GCs though only 256 and though about 100 plus regions
>> per server?
>>
>> > top says cpus are nearly idle on all machines.
>> >
>> > It's far from ideal but we need to prove that this can work reliably to
>> get
>> > more toys.
>> > Maybe next week we will be able to test on some better machines but for
>> now
>> > that all what I've got.
>> >
>> Makes sense.  You are starting very small though and virtual machines
>> have proven a flakey foundation for hbase.  Read back over the list
>> and look for ec2 mentions.
>>
>> St.Ack
>>
>> >
>> > Any advices are welcome.
>> >
>> >
>> > Thanks,
>> > Michal
>> >
>>
>

Mime
View raw message