hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Slow region moves
Date Wed, 21 Oct 2015 10:52:27 GMT
Seems that the BucketAllocator#freeBlock() is synchronized and hence all
the bulk close that it tries to do will be blocked in the synchronized
block.  May be something like the IdLock has to be tried here?

Regards
Ram

On Wed, Oct 21, 2015 at 4:20 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> I think the forceful clearing of the blocks from the bucket cache is
> hurting in this case.  I think it is worth opening a JIRA for this and work
> on a fix.
>
> Regards
> Ram
>
> On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox <rfox@connexity.com> wrote:
>
>> Hi Vlad,
>>
>> I tried it on a table and on a RegionServer basis and it appears to have
>> no affect.
>> Are we sure it is supported for bucket cache?  From my charts the bucket
>> cache is getting cleared at the same time as the region moves occurred.
>> The regions slow to move are the ones with bucket cache.
>>
>> I took a table with 102 regions and blockcache true and turned off block
>> cache via alter while the table is enabled - it took 19 minutes.  To turn
>> block cache back on took 4.3 seconds.
>>
>> Let me know if there is anything else to try.  This issue is really
>> hurting our day to day ops.
>>
>> Thanks,
>>
>> Randy
>>
>>
>>
>> On 10/15/15, 3:55 PM, "Vladimir Rodionov" <vladrodionov@gmail.com> wrote:
>>
>> >Hey, Randy
>> >
>> >You can verify your hypothesis by setting hbase.rs.evictblocksonclose to
>> >false for your tables.
>> >
>> >-Vlad
>> >
>> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox <rfox@connexity.com> wrote:
>> >
>> >> Caveat - we are trying to tune the BucketCache (probably a new thread
>> - as
>> >> we are not sure we are getting the most out of it)
>> >> 72G off heap
>> >>
>> >> <property>
>> >>    <name>hfile.block.cache.size</name>
>> >>    <value>0.58</value>
>> >> </property>
>> >>
>> >> <property>
>> >>    <name>hbase.bucketcache.ioengine</name>
>> >>    <value>offheap</value>
>> >> </property>
>> >>
>> >> <property>
>> >>    <name>hbase.bucketcache.size</name>
>> >>    <value>72800</value>
>> >> </property>
>> >>
>> >> <property>
>> >>    <name>hbase.bucketcache.bucket.sizes</name>
>> >>    <value>9216,17408,33792,66560</value>
>> >> </property>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 10/15/15, 12:00 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>> >>
>> >> >I am a bit curious.
>> >> >0.94 doesn't have BucketCache.
>> >> >
>> >> >Can you share BucketCache related config parameters in your cluster
?
>> >> >
>> >> >Cheers
>> >> >
>> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox <rfox@connexity.com>
>> wrote:
>> >> >
>> >> >>
>> >> >> "StoreFileCloserThread-L-1" prio=10 tid=0x00000000027ec800
>> nid=0xad84
>> >> >> runnable [0x00007fbcc0c65000]
>> >> >>    java.lang.Thread.State: RUNNABLE
>> >> >>         at java.util.LinkedList.indexOf(LinkedList.java:602)
>> >> >>         at java.util.LinkedList.contains(LinkedList.java:315)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
>> >> >>         - locked <0x000000041b0887a8> (a
>> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
>> >> >>         - locked <0x00000004944ff2d8> (a
>> >> >> org.apache.hadoop.hbase.regionserver.StoreFile)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >>
>> >> >>
>> >>
>> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
>> >> >> prio=10 tid=0x0000000003508800 nid=0xad83 waiting on condition
>> >> >> [0x00007fbcc5dcc000]
>> >> >>    java.lang.Thread.State: WAITING (parking)
>> >> >>         at sun.misc.Unsafe.park(Native Method)
>> >> >>         - parking to wait for  <0x0000000534e90a80> (a
>> >> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >> >>         at
>> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883)
>> >> >>         at
>> >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >>
>> >> >>
>> >> >> "RS_CLOSE_REGION-hb20:60020-0" prio=10 tid=0x00007fcec0142000
>> nid=0x3056
>> >> >> waiting on condition [0x00007fbcc2d87000]
>> >> >>    java.lang.Thread.State: WAITING (parking)
>> >> >>         at sun.misc.Unsafe.park(Native Method)
>> >> >>         - parking to wait for  <0x0000000534e61360> (a
>> >> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >> >>         at
>> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280)
>> >> >>         - locked <0x000000042230fa68> (a java.lang.Object)
>> >> >>         at
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
>> >> >>         at
>> >> >>
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >>         at
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >>
>> >> >>
>> >> >> I attached the whole thing as well.
>> >> >>
>> >> >> -r
>> >> >>
>> >> >>
>> >> >> On 10/15/15, 10:39 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>> >> >>
>> >> >> >Can you give a bit more detail on why block eviction was cause
for
>> the
>> >> >> slow region movement?
>> >> >> >
>> >> >> >Did you happen to take stack traces ?
>> >> >> >
>> >> >> >Thanks
>> >> >> >
>> >> >> >> On Oct 15, 2015, at 10:32 AM, Randy Fox <rfox@connexity.com>
>> wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> We just upgraded from 0.94 to 1.0.0 and have noticed that
region
>> >> moves
>> >> >> are super slow (order of minutes) whereas previously they where
in
>> the
>> >> >> seconds range.  After looking at the code, I think the time is
spent
>> >> >> waiting for the blocks to be evicted from block cache.
>> >> >> >>
>> >> >> >> I wanted to verify that this theory is correct and see
if there
>> is
>> >> >> anything that can be done to speed up the moves.
>> >> >> >>
>> >> >> >> This is particular painful as we are trying to get our
configs
>> tuned
>> >> to
>> >> >> the new SW and need to do rolling restarts which is taking almost
24
>> >> hours
>> >> >> on our cluster.  We also do our own manual rebalancing of regions
>> across
>> >> >> RS’s and that task is also now painful.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >>
>> >> >> >> Randy
>> >> >>
>> >>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message