hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From louis hust <louis.h...@gmail.com>
Subject Re: YCSB load failed because hbase region too busy
Date Tue, 25 Nov 2014 13:43:35 GMT
hi ram,
thanks for help, i just do a test for bucket cache, in product env, we will follow your suggestion

Sent from my iPhone

> On 2014年11月25日, at 20:36, ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
wrote:
> 
> Your write ingest is too high. You have to control that by first adding
> more nodes and ensuring that you have a more distributed load.  And also
> try with the changing the hbase.hstore.blockingStoreFiles.
> 
> Even changing the above value if your write ingest is so high such that if
> it can reach this configured value again you can see blocking writes.
> 
> Regards
> RAm
> 
> 
>> On Tue, Nov 25, 2014 at 2:20 PM, Qiang Tian <tianq01@gmail.com> wrote:
>> 
>> in your log:
>> 2014-11-25 13:31:35,048 WARN  [MemStoreFlusher.13]
>> regionserver.MemStoreFlusher: Region
>> usertable2,user8289,1416889268210.7e8fd83bb34b155bd0385aa63124a875. has too
>> many store files; delaying flush up to 90000ms
>> 
>> please see my original reply...you can try increasing
>> "hbase.hstore.blockingStoreFiles", also you have only 1 RS and you split to
>> 100 regions....you can try 2 RS with 20 regions.
>> 
>> 
>> 
>>> On Tue, Nov 25, 2014 at 3:42 PM, louis.hust <louis.hust@gmail.com> wrote:
>>> 
>>> yes, the stack trace like below:
>>> 
>>> 2014-11-25 13:35:40:946 4260 sec: 232700856 operations; 28173.18 current
>>> ops/sec; [INSERT AverageLatency(us)=637.59]
>>> 2014-11-25 13:35:50:946 4270 sec: 232700856 operations; 0 current
>> ops/sec;
>>> 14/11/25 13:35:59 INFO client.AsyncProcess: #14, table=usertable2,
>>> attempt=10/35 failed 109 ops, last exception:
>>> org.apache.hadoop.hbase.RegionTooBusyException:
>>> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit,
>> regionName=usertable2,user8289,1416889268210.7e8fd83bb34b155bd0385aa63124a875.,
>>> server=l-hbase10.dba.cn1.qunar.com,60020,1416889404151,
>>> memstoreSize=536886800, blockingMemStoreSize=536870912
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2822)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2234)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
>>>        at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
>>>        at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
>>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>>>        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>        at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
>>>        at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
>>>        at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
>>>        at java.lang.Thread.run(Thread.java:744)
>>> 
>>> Then i loopup the memstore size for user8289, is 512M. and now is still
>>> 512M(15:40)
>>> 
>>> The region server log is attached which maybe help.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Nov 25, 2014, at 15:27, ramkrishna vasudevan <
>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>> 
>>>> Are you getting any exceptions in the log?  Do you have a stack trace
>>> when
>>>> it is blocked?
>>>> 
>>>>> On Tue, Nov 25, 2014 at 12:30 PM, louis.hust <louis.hust@gmail.com>
>>>> wrote:
>>>> 
>>>>> hi,Ram
>>>>> 
>>>>> After i modify the  hbase.hstore.flusher.count, it just improve the
>>> load,
>>>>> but after one hour , the YCSB
>>>>> load program is still blocked! Then I change
>> hbase.hstore.flusher.count
>>> to
>>>>> 40, but it’s the same as 20,
>>>>> 
>>>>> On Nov 25, 2014, at 14:47, ramkrishna vasudevan <
>>>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>>>> 
>>>>>>>> hbase.hstore.flusher.count to 20 (default value is 2), and
run the
>>> YCSB
>>>>>> to load data
>>>>>> with 32 threads
>>>>>> 
>>>>>> Apologies for the late reply. Your change of configuraton from 2
to
>> 20
>>> is
>>>>>> right in this case because you are data ingest rate is high I
>> suppose.
>>>>>> 
>>>>>> Thanks for the reply.
>>>>>> 
>>>>>> Regards
>>>>>> Ram
>>>>>> 
>>>>>>> On Tue, Nov 25, 2014 at 12:09 PM, louis.hust <louis.hust@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> hi, all
>>>>>>> 
>>>>>>> I retest the YCSB load data, and here is a situation which may
>> explain
>>>>> the
>>>>>>> load data blocked.
>>>>>>> 
>>>>>>> I use too many threads to insert values, so the flush thread
is not
>>>>>>> effectively to handle all memstore,
>>>>>>> and the user9099 memstore is queued at last, and waiting for
flush
>> too
>>>>>>> long which blocks the YCSB request.
>>>>>>> 
>>>>>>> Then I modify the configuration, set hbase.hstore.flusher.count
to
>> 20
>>>>>>> (default value is 2), and run the YCSB to load data
>>>>>>> with 32 threads, it can run for 1 hour (with 2 threads just run
for
>>> less
>>>>>>> than half 1 hour).
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 20, 2014, at 23:20, louis.hust <louis.hust@gmail.com>
wrote:
>>>>>>>> 
>>>>>>>> Hi Ram,
>>>>>>>> 
>>>>>>>> Thanks for your reply!
>>>>>>>> 
>>>>>>>> I use YCSB workloadc to load data, and from the web request
>> monitor i
>>>>>>> can see that
>>>>>>>> the write requests are distributed among all regions, so
i think
>> the
>>>>>>> data get distributed,
>>>>>>>> 
>>>>>>>> And there are 32 thread writing to the region server, may
be the
>>>>>>> concurrency and write rate is too high.
>>>>>>>> The writes are blocked but the memstore do not get flushed,
i want
>> to
>>>>>>> know why?
>>>>>>>> 
>>>>>>>> The jvm heap is 64G and hbase.regionserver.global.memstore.size
is
>>>>>>> default(0.4) about 25.6G,
>>>>>>>> and hbase.hregion.memstore.flush.size is default(132M), 
but the
>>>>> blocked
>>>>>>> memstore user9099
>>>>>>>> reach 512m and do not flush at all.
>>>>>>>> 
>>>>>>>> other memstore related options:
>>>>>>>> 
>>>>>>>> hbase.hregion.memstore.mslab.enabled=true
>>>>>>>> hbase.regionserver.global.memstore.upperLimit=0.4
>>>>>>>> hbase.regionserver.global.memstore.lowerLimit=0.38
>>>>>>>> hbase.hregion.memstore.block.multiplier=4
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 20, 2014, at 20:38, ramkrishna vasudevan <
>>>>>>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Check if the writes are going to that particular region
and its
>> rate
>>>>> is
>>>>>>> too high.  Ensure that the data gets distributed among all regions.
>>>>>>>>> What is the memstore size?
>>>>>>>>> 
>>>>>>>>> If the rate of writes is very high then the flushing
will get
>> queued
>>>>>>> and until the memstore gets flushed such that it goes down the
>> global
>>>>> upper
>>>>>>> limit writes will be blocked.
>>>>>>>>> 
>>>>>>>>> I don't have the code now to see the exact config related
to
>>> memstore.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Ram
>>>>>>>>> 
>>>>>>>>> On Thu, Nov 20, 2014 at 4:50 PM, louis.hust <louis.hust@gmail.com
>>> 
>>>>>>> wrote:
>>>>>>>>> hi all,
>>>>>>>>> 
>>>>>>>>> I build an HBASE test environment, with three PC server,
with CHD
>>>>> 5.1.0
>>>>>>>>> 
>>>>>>>>> pc1 pc2 pc3
>>>>>>>>> 
>>>>>>>>> pc1 and pc2 as HMASTER and hadoop namenode
>>>>>>>>> pc3 as RegionServer and datanode
>>>>>>>>> 
>>>>>>>>> Then I create user as following:
>>>>>>>>> create 'usertable', 'family', {SPLITS => (1..100).map
{|i|
>>>>>>> "user#{1000+i*(9999-1000)/100}"} }
>>>>>>>>> Using YCSB for load data as following:
>>>>>>>>> 
>>>>>>>>> ./bin/ycsb  load  hbase   -P workloads/workloadc  -p
>>>>>>> columnfamily=family -p recordcount=1000000000   -p threadcount=32
>>> -s  >
>>>>>>> result/workloadc
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> But when after a while, the ycsb return with following
error:
>>>>>>>>> 
>>>>>>>>> 14/11/20 12:23:44 INFO client.AsyncProcess: #15, table=usertable,
>>>>>>> attempt=35/35 failed 715 ops, last exception:
>>>>>>> org.apache.hadoop.hbase.RegionTooBusyException:
>>>>>>> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore
>> limit,
>> regionName=usertable,user9099,1416453519676.2552d36eb407a8af12d2b58c973d68a9.,
>>>>>>> server=l-hbase10.dba.cn1,60020,1416451280772,
>> memstoreSize=536897120,
>>>>>>> blockingMemStoreSize=536870912
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2822)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2234)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2201)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2205)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4253)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3469)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3359)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29503)
>>>>>>>>>      at
>>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
>>>>>>>>>      at
>>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
>>>>>>>>>      at
>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
>>>>>>>>>      at java.lang.Thread.run(Thread.java:744)
>>>>>>>>> on l-hbase10.dba.cn1,60020,1416451280772, tracking started
Thu Nov
>>> 20
>>>>>>> 12:15:07 CST 2014, retrying after 20051 ms, replay 715 ops.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> It seems the user9099 region is too busy, so I lookup
the memstore
>>>>>>> metrics in web:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> As you see, the user9099 is bigger than other region,
I think it
>> is
>>>>>>> flushing, but after a while, it does not change to a small size
and
>>> YCSB
>>>>>>> quit finally.
>>>>>>>>> 
>>>>>>>>> But when i change the concurrency threads to 4, all is
right. I
>> want
>>>>> to
>>>>>>> know why?
>>>>>>>>> 
>>>>>>>>> Any idea will be appreciated.
>> 

Mime
View raw message