hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinsong Hu" <jinsong...@hotmail.com>
Subject Re: regionserver crash under heavy load
Date Thu, 15 Jul 2010 19:55:54 GMT
I checked the gc-hbase.log, the longest GC time is  9.3477570 seconds. since 
I already
used -XX:+UseConcMarkSweepGC.  So it is not GC issue.

the 2 minute gap is most probably the system is too busy with disk IO. I 
notice that
a lot of times, when the system is busy, even a simple "ls" command will 
take several
seconds.


since this is not initial import, so I can't really use HFileOutputFormat.

In production, I do have stronger machines and I give 6G memory to 
regionserver, and don't colocate it
with tasktracker. but I am restricted to use it for testing.

The reason that I started this extensive testing is to understand why 
production machine
regionserver occasionally crash, as I have seem before.  Based on my 
testing,  I realized that
hbase does have problem with sustained high rate data insert . With the 
previous reasoning,
if we keep high data rate,  production machine will run out of disk IO 
capacity sooner or later,
before the cluster runs out of memory, CPU, or storage space.
unless we constantly monitor the disk IO usage and continue to add more 
regionservers, this problem
will happen. The disk IO requirements goes up linearly with number of 
regions if we make the key uniformly
distributed.

Alternative solution seems to be design the key so that the actively 
inserted regions is limited and not
distributed to all the regions.  For lots of applications, this seems to be 
a hard to do, especially when
we need to consider optimization for data reading.

Jimmy.


--------------------------------------------------
From: "Jean-Daniel Cryans" <jdcryans@apache.org>
Sent: Thursday, July 15, 2010 10:20 AM
To: <user@hbase.apache.org>; <jinsong_hu@hotmail.com>
Subject: Re: regionserver crash under heavy load

> About your new crash:
>
> 2010-07-15 04:09:03,248 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
> Max=408.775MB (428631648), Counts: Blocks=0, Access=1985476, Hit=0,
> Miss=1985476, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss
> Ratio=100.0%, Evicted/Run=NaN
> 2010-07-15 04:11:32,791 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
> Max=408.775MB (428631648), Counts: Blocks=0, Access=1985759, Hit=0,
> Miss=1985759, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss
> Ratio=100.0%, Evicted/Run=NaN
> 2010-07-15 04:11:32,965 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
> Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
> Max=408.775MB (428631648), Counts: Blocks=0, Access=1985768, Hit=0,
> Miss=1985768, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss
> Ratio=100.0%, Evicted/Run=NaN
> 2010-07-15 04:11:33,330 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 176029ms for
> sessionid 0x129c87a7f980045, closing socket connection and attempting
> reconnect
> 2010-07-15 04:11:33,330 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 175969ms for
> sessionid 0x129c87a7f980052, closing socket connection and attempting
> reconnect
>
> Your machine wasn't able to talk to the server for almost 2 minutes,
> and we can see that it stopped logging for approx that time. Again,
> I'd like to point out that this is a JVM/hardware issue.
>
> My other comments inline.
>
> J-D
>
> On Thu, Jul 15, 2010 at 9:55 AM, Jinsong Hu <jinsong_hu@hotmail.com> 
> wrote:
>> I tested the hbase with client throttling and I noticed that in the
>> beginning, the process is limited by CPU speed. after running for several
>> hours, I noticed that the speed is limited by disk. Then I realized that
>> there is a serious flaw in hbase design:
>>
>> In the beginning , the number  of regions is very small. So   hbase only
>> does compaction for those limited number of regions.  But as insertion
>> continues, we will have more and more regions,the clients continue to 
>> insert
>> more records to the regions, preferably uniformly . At that time lots of
>> regions need to be compacted. Most of the clusters has relatively stable
>> number of physical machines. More regions leads to more compactions, that
>> means for a given physical machine, it needs to support more compaction 
>> as
>> time go on. And the compaction most involves disk IO, so sooner or later,
>> the disk IO capacity will be exhausted. At that moment, the machine is 
>> not
>> responsive any longer, and finally time out happens , and hbase 
>> regionserver
>> crash.
>
> To me it looks like a GC pause with swapping and/or CPU starvation,
> which leads to the region server shutting down itself.
>
>>
>> This seems to be a design flaw for hbase. In my testing, even sustained
>> insertion rate of 1000 can crash the cluster. Is there any solution for 
>> this
>> ?
>
> Well first the values you are inserting are a lot bigger than what
> HBase is optimized for out of the box.
>
> WRT your IO exhaustion description, what you really described is the
> limit of scalability for a single region server. If HBase had a single
> server architecture, I'd say that you're right, but this is not the
> case. In the Bigtable paper, they say they usually support 100 tablets
> (regions) per TableServer (our RegionServer). When you grow more than
> that, you need to add more machines! This is the scalability model,
> HBase can grow to hundreds/thousands of servers without any fancy
> third party software.
>
> WRT solutions:
>
> 1- if this is your initial data import, I'd recommend you use the
> HFileOutputFormat since it will be much faster and you don't have to
> wait after splits/compactions to happen. See
> http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.html
>
> 2- if you are trying to see how HBase handles your expected "normal"
> load, then I would recommend getting production-grade machines.
>
> 3- if you wish to keep the same hardware, you need to give more
> ressources to HBase. Running maps and reduces on the same machine,
> when you have only 4 cores, easily leads to resource contention. We
> recommend running HBase with at least 4GB, you have only 2GB. Also it
> needs cores, at StumbleUpon for example we use 2xi7 which accounts to
> 16 CPUs when counting hyperthreading, so we run 8 maps and 6 reducers
> per machine leaving at least 2 for HBase (it's a database, it needs
> processing power). Even when we did the initial import of our data, we
> didn't have any GC issues. This is also the setup for our production
> cluster, although we (almost) don't run MR jobs on it since they
> usually are IO-heavy.
>
>>
>> Jimmy.
>>
> 

Mime
View raw message