hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michał Podsiadłowski <podsiadlow...@gmail.com>
Subject Re: Hbase fails at moderate load.
Date Mon, 01 Feb 2010 15:59:44 GMT
Hi Stack,
thanks for your last input.

I've started new week with few tweaks of environment. I've put down one of
the web servers so i gained additional node.
I've put there HMaster, both namenodes and zookeeper and requested from our
IT stuff some additional memory to rest of nodes.

Now setup is like this:
Namenode + Secondary Namenode + HMaster @ 1GB + zookeeper @256MB - machine
with 4gb
3 x datanodes/hregions   - DataNode @768Mb  + HRegion @1GB  - machines  2GB
of ram
2 additional zookeepers @256MB on webservers that are uploading to hbase.

Probably more memory for OS cache/buffors on datanodes would be useful but
free -m after quite long upload says:
*             total       used       free     shared    buffers     cached
Mem:          2048        903       1144          0         37        362
-/+ buffers/cache:        503       1544
Swap:         1019          0       1019

*All is based on hadoop 0.20.2 and hbase 0.20.3.


All seems to be much more stable.
Too many open files is no longer a problem (max file size - 16mb was wrong
idea).
But still problem with dividing very first region occured.
For around 1 minute regions were dividing and dividing till they reach total
count around 130.
During that time in .META. some regions were not assigned to servers  ( exp.
no address for region in .META.).
But I think i haven't seen problems with hitting wrong regions or not
serving regions.
This is something that really freaks us out, because potential this can
happen every region split
and then whole application can go bananas.
Can someone explain why regions are dividing so rapidly and to such a
quantity?

http://pastebin.com/m73276a36 - here you can find a piece of log from that
moment


Cheers,
Michal

*

*

2010/1/31 Stack <stack@duboce.net>

> What Tim said and then some comments in the below.
>
> What version of hbase?
>
>
> >
> > This happens every time when first region starts to split. As far as i
> can
> > see table is set to enabled *false* (web admin), web admin becomes little
> > bit less responsible - listing table regions shows no regions.
> > and after a while i can see 500 or more regions.
>
> You go from zero to 500 regions with nothing showing in between?
> Thats pretty impressive.  500 regions in 256M on 3 servers is probably
> pushing it
>
>  Some of them as exception
> > shows are not fully available.
>
> Identify the duff regions by running a full table scan in the shell
> with DEBUG enabled on the client.  It'll puke when it hits the first
> broke region
>
> HDFS doesn't seems to be the main issue. When
> > i run fsck it says hbase dir is healthy apart from some under replicated
> > blocks. Occasionaly i saw that some blocks where missing but i think this
> > was due to "Too many files open" exceptions (to small regions size - now
> > it's default 64)
>
> Too many open files is bad.  Check out the hbase 'Getting Started'.
>
>
> > Amount of data is not enormous - around 1gb in less then 100k rows then
> this
> > problems starts to occur. Request per seconds is i think small - 20-30
> per
> > second.
> > What else i can say is I've set the max hbase retry to only 2 because we
> > can't allow clients to wait more for response.
> >
>
> I would suggest you leave things at default till running smooth then
> start in optimizing.
>
>
> > What i would like to know is whether the table is always disabled when
> > performing region splits?
>
> No.  Region goes offline for some period of time.  If machines are
> heavily loaded it will take longer for it to come back on line again.
>
> And is it truly disabled then so that clients
> > can't do anything?
> > It looks like status says disabled but still requests are processed,
> though,
> > with different results (some like above).
> >
>
> Disabled or 'offline'?   Parents of region splits go offline and are
> replaced by new daughter splits.
>
> >
> >
> > My cluster setup can be probably useful -
> > 3 centos virtual machines based on xen running DN/HR and zookeeper  + one
> of
> > them NodeMaster and Secondary Master.
> > 2 gigs of ram on each. Currently hadoop processes run with Xmx 512 and
> hbase
> > with 256 but non of them is swapping nor going out of memory.
> > GC logs looks normal - stop the world is not occurring ;)
>
>
> Really?  No full GCs though only 256 and though about 100 plus regions
> per server?
>
> > top says cpus are nearly idle on all machines.
> >
> > It's far from ideal but we need to prove that this can work reliably to
> get
> > more toys.
> > Maybe next week we will be able to test on some better machines but for
> now
> > that all what I've got.
> >
> Makes sense.  You are starting very small though and virtual machines
> have proven a flakey foundation for hbase.  Read back over the list
> and look for ec2 mentions.
>
> St.Ack
>
> >
> > Any advices are welcome.
> >
> >
> > Thanks,
> > Michal
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message