hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Slow region moves
Date Wed, 21 Oct 2015 15:14:23 GMT
I wonder why disabling cache eviction on close does not work in a case of a
bucket cache? I checked the code and did not find
anything suspicious.  It has to work.

On Wed, Oct 21, 2015 at 3:52 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Seems that the BucketAllocator#freeBlock() is synchronized and hence all
> the bulk close that it tries to do will be blocked in the synchronized
> block.  May be something like the IdLock has to be tried here?
>
> Regards
> Ram
>
> On Wed, Oct 21, 2015 at 4:20 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > I think the forceful clearing of the blocks from the bucket cache is
> > hurting in this case.  I think it is worth opening a JIRA for this and
> work
> > on a fix.
> >
> > Regards
> > Ram
> >
> > On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox <rfox@connexity.com> wrote:
> >
> >> Hi Vlad,
> >>
> >> I tried it on a table and on a RegionServer basis and it appears to have
> >> no affect.
> >> Are we sure it is supported for bucket cache?  From my charts the bucket
> >> cache is getting cleared at the same time as the region moves occurred.
> >> The regions slow to move are the ones with bucket cache.
> >>
> >> I took a table with 102 regions and blockcache true and turned off block
> >> cache via alter while the table is enabled - it took 19 minutes.  To
> turn
> >> block cache back on took 4.3 seconds.
> >>
> >> Let me know if there is anything else to try.  This issue is really
> >> hurting our day to day ops.
> >>
> >> Thanks,
> >>
> >> Randy
> >>
> >>
> >>
> >> On 10/15/15, 3:55 PM, "Vladimir Rodionov" <vladrodionov@gmail.com>
> wrote:
> >>
> >> >Hey, Randy
> >> >
> >> >You can verify your hypothesis by setting hbase.rs.evictblocksonclose
> to
> >> >false for your tables.
> >> >
> >> >-Vlad
> >> >
> >> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox <rfox@connexity.com> wrote:
> >> >
> >> >> Caveat - we are trying to tune the BucketCache (probably a new thread
> >> - as
> >> >> we are not sure we are getting the most out of it)
> >> >> 72G off heap
> >> >>
> >> >> <property>
> >> >>    <name>hfile.block.cache.size</name>
> >> >>    <value>0.58</value>
> >> >> </property>
> >> >>
> >> >> <property>
> >> >>    <name>hbase.bucketcache.ioengine</name>
> >> >>    <value>offheap</value>
> >> >> </property>
> >> >>
> >> >> <property>
> >> >>    <name>hbase.bucketcache.size</name>
> >> >>    <value>72800</value>
> >> >> </property>
> >> >>
> >> >> <property>
> >> >>    <name>hbase.bucketcache.bucket.sizes</name>
> >> >>    <value>9216,17408,33792,66560</value>
> >> >> </property>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On 10/15/15, 12:00 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >> >>
> >> >> >I am a bit curious.
> >> >> >0.94 doesn't have BucketCache.
> >> >> >
> >> >> >Can you share BucketCache related config parameters in your cluster
> ?
> >> >> >
> >> >> >Cheers
> >> >> >
> >> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox <rfox@connexity.com>
> >> wrote:
> >> >> >
> >> >> >>
> >> >> >> "StoreFileCloserThread-L-1" prio=10 tid=0x00000000027ec800
> >> nid=0xad84
> >> >> >> runnable [0x00007fbcc0c65000]
> >> >> >>    java.lang.Thread.State: RUNNABLE
> >> >> >>         at java.util.LinkedList.indexOf(LinkedList.java:602)
> >> >> >>         at java.util.LinkedList.contains(LinkedList.java:315)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
> >> >> >>         - locked <0x000000041b0887a8> (a
> >> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
> >> >> >>         - locked <0x00000004944ff2d8> (a
> >> >> >> org.apache.hadoop.hbase.regionserver.StoreFile)
> >> >> >>         at
> >> >> >>
> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
> >> >> >>         at
> >> >> >>
> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
> >> >> >>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> >> >>         at
> >> >> >>
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> >> >>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >> >>         at java.lang.Thread.run(Thread.java:745)
> >> >> >>
> >> >> >>
> >> >>
> >>
> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
> >> >> >> prio=10 tid=0x0000000003508800 nid=0xad83 waiting on condition
> >> >> >> [0x00007fbcc5dcc000]
> >> >> >>    java.lang.Thread.State: WAITING (parking)
> >> >> >>         at sun.misc.Unsafe.park(Native Method)
> >> >> >>         - parking to wait for  <0x0000000534e90a80>
(a
> >> >> >>
> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >> >> >>         at
> >> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
> >> >> >>         at
> >> >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883)
> >> >> >>         at
> >> >> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126)
> >> >> >>         at
> >> >> >>
> >> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378)
> >> >> >>         at
> >> >> >>
> >> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375)
> >> >> >>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> >> >>         at
> >> >> >>
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >> >> >>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >> >>         at java.lang.Thread.run(Thread.java:745)
> >> >> >>
> >> >> >>
> >> >> >> "RS_CLOSE_REGION-hb20:60020-0" prio=10 tid=0x00007fcec0142000
> >> nid=0x3056
> >> >> >> waiting on condition [0x00007fbcc2d87000]
> >> >> >>    java.lang.Thread.State: WAITING (parking)
> >> >> >>         at sun.misc.Unsafe.park(Native Method)
> >> >> >>         - parking to wait for  <0x0000000534e61360>
(a
> >> >> >>
> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> >> >> >>         at
> >> >> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
> >> >> >>         at
> >> >> >>
> >> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385)
> >> >> >>         at
> >> >> >>
> >> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280)
> >> >> >>         - locked <0x000000042230fa68> (a java.lang.Object)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
> >> >> >>         at
> >> >> >>
> >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> >> >>         at
> >> >> >>
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> >> >>         at java.lang.Thread.run(Thread.java:745)
> >> >> >>
> >> >> >>
> >> >> >> I attached the whole thing as well.
> >> >> >>
> >> >> >> -r
> >> >> >>
> >> >> >>
> >> >> >> On 10/15/15, 10:39 AM, "Ted Yu" <yuzhihong@gmail.com>
wrote:
> >> >> >>
> >> >> >> >Can you give a bit more detail on why block eviction was
cause
> for
> >> the
> >> >> >> slow region movement?
> >> >> >> >
> >> >> >> >Did you happen to take stack traces ?
> >> >> >> >
> >> >> >> >Thanks
> >> >> >> >
> >> >> >> >> On Oct 15, 2015, at 10:32 AM, Randy Fox <rfox@connexity.com>
> >> wrote:
> >> >> >> >>
> >> >> >> >> Hi,
> >> >> >> >>
> >> >> >> >> We just upgraded from 0.94 to 1.0.0 and have noticed
that
> region
> >> >> moves
> >> >> >> are super slow (order of minutes) whereas previously they
where in
> >> the
> >> >> >> seconds range.  After looking at the code, I think the time
is
> spent
> >> >> >> waiting for the blocks to be evicted from block cache.
> >> >> >> >>
> >> >> >> >> I wanted to verify that this theory is correct and
see if there
> >> is
> >> >> >> anything that can be done to speed up the moves.
> >> >> >> >>
> >> >> >> >> This is particular painful as we are trying to get
our configs
> >> tuned
> >> >> to
> >> >> >> the new SW and need to do rolling restarts which is taking
almost
> 24
> >> >> hours
> >> >> >> on our cluster.  We also do our own manual rebalancing of
regions
> >> across
> >> >> >> RS’s and that task is also now painful.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Thanks,
> >> >> >> >>
> >> >> >> >> Randy
> >> >> >>
> >> >>
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message