hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randy Fox <r...@connexity.com>
Subject Re: Slow region moves
Date Thu, 22 Oct 2015 16:40:57 GMT
Hi Vlad,

So far patch seems to work perfectly.

-randy




On 10/21/15, 12:52 PM, "Vladimir Rodionov" <vladrodionov@gmail.com> wrote:

>Randy,
>
>You can try patch I just submitted. It is for master but I verified it on
>1.0 branch as well.
>
>-Vlad
>
>On Wed, Oct 21, 2015 at 11:40 AM, Randy Fox <rfox@connexity.com> wrote:
>
>> https://issues.apache.org/jira/browse/HBASE-14663
>>
>> -r
>>
>>
>>
>> On 10/21/15, 10:35 AM, "Vladimir Rodionov" <vladrodionov@gmail.com> wrote:
>>
>> >You are right, Randy
>> >
>> >This is the bug. Will you open JIRA?
>> >
>> >-Vlad
>> >
>> >On Wed, Oct 21, 2015 at 9:35 AM, Randy Fox <rfox@connexity.com> wrote:
>> >
>> >> Maybe I am looking in the wrong place but Hstore::close() has the
>> >> evictOnClose parameter hard coded to true:
>> >>
>> >> // close each store file in parallel
>> >> CompletionService<Void> completionService =
>> >>   new ExecutorCompletionService<Void>(storeFileCloserThreadPool);
>> >> for (final StoreFile f : result) {
>> >>   completionService.submit(new Callable<Void>() {
>> >>     @Override
>> >>     public Void call() throws IOException {
>> >>       f.closeReader(true);
>> >>       return null;
>> >>     }
>> >>   });
>> >> }
>> >>
>> >>
>> >> Where does that setting come into play?
>> >>
>> >> -r
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 10/21/15, 8:14 AM, "Vladimir Rodionov" <vladrodionov@gmail.com>
>> wrote:
>> >>
>> >> >I wonder why disabling cache eviction on close does not work in a case
>> of
>> >> a
>> >> >bucket cache? I checked the code and did not find
>> >> >anything suspicious.  It has to work.
>> >> >
>> >> >On Wed, Oct 21, 2015 at 3:52 AM, ramkrishna vasudevan <
>> >> >ramkrishna.s.vasudevan@gmail.com> wrote:
>> >> >
>> >> >> Seems that the BucketAllocator#freeBlock() is synchronized and
hence
>> all
>> >> >> the bulk close that it tries to do will be blocked in the
>> synchronized
>> >> >> block.  May be something like the IdLock has to be tried here?
>> >> >>
>> >> >> Regards
>> >> >> Ram
>> >> >>
>> >> >> On Wed, Oct 21, 2015 at 4:20 PM, ramkrishna vasudevan <
>> >> >> ramkrishna.s.vasudevan@gmail.com> wrote:
>> >> >>
>> >> >> > I think the forceful clearing of the blocks from the bucket
cache
>> is
>> >> >> > hurting in this case.  I think it is worth opening a JIRA
for this
>> and
>> >> >> work
>> >> >> > on a fix.
>> >> >> >
>> >> >> > Regards
>> >> >> > Ram
>> >> >> >
>> >> >> > On Wed, Oct 21, 2015 at 12:00 AM, Randy Fox <rfox@connexity.com>
>> >> wrote:
>> >> >> >
>> >> >> >> Hi Vlad,
>> >> >> >>
>> >> >> >> I tried it on a table and on a RegionServer basis and
it appears
>> to
>> >> have
>> >> >> >> no affect.
>> >> >> >> Are we sure it is supported for bucket cache?  From my
charts the
>> >> bucket
>> >> >> >> cache is getting cleared at the same time as the region
moves
>> >> occurred.
>> >> >> >> The regions slow to move are the ones with bucket cache.
>> >> >> >>
>> >> >> >> I took a table with 102 regions and blockcache true and
turned off
>> >> block
>> >> >> >> cache via alter while the table is enabled - it took 19
minutes.
>> To
>> >> >> turn
>> >> >> >> block cache back on took 4.3 seconds.
>> >> >> >>
>> >> >> >> Let me know if there is anything else to try.  This issue
is
>> really
>> >> >> >> hurting our day to day ops.
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >>
>> >> >> >> Randy
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On 10/15/15, 3:55 PM, "Vladimir Rodionov" <vladrodionov@gmail.com
>> >
>> >> >> wrote:
>> >> >> >>
>> >> >> >> >Hey, Randy
>> >> >> >> >
>> >> >> >> >You can verify your hypothesis by setting
>> >> hbase.rs.evictblocksonclose
>> >> >> to
>> >> >> >> >false for your tables.
>> >> >> >> >
>> >> >> >> >-Vlad
>> >> >> >> >
>> >> >> >> >On Thu, Oct 15, 2015 at 1:06 PM, Randy Fox <rfox@connexity.com>
>> >> wrote:
>> >> >> >> >
>> >> >> >> >> Caveat - we are trying to tune the BucketCache
(probably a new
>> >> thread
>> >> >> >> - as
>> >> >> >> >> we are not sure we are getting the most out of
it)
>> >> >> >> >> 72G off heap
>> >> >> >> >>
>> >> >> >> >> <property>
>> >> >> >> >>    <name>hfile.block.cache.size</name>
>> >> >> >> >>    <value>0.58</value>
>> >> >> >> >> </property>
>> >> >> >> >>
>> >> >> >> >> <property>
>> >> >> >> >>    <name>hbase.bucketcache.ioengine</name>
>> >> >> >> >>    <value>offheap</value>
>> >> >> >> >> </property>
>> >> >> >> >>
>> >> >> >> >> <property>
>> >> >> >> >>    <name>hbase.bucketcache.size</name>
>> >> >> >> >>    <value>72800</value>
>> >> >> >> >> </property>
>> >> >> >> >>
>> >> >> >> >> <property>
>> >> >> >> >>    <name>hbase.bucketcache.bucket.sizes</name>
>> >> >> >> >>    <value>9216,17408,33792,66560</value>
>> >> >> >> >> </property>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> On 10/15/15, 12:00 PM, "Ted Yu" <yuzhihong@gmail.com>
wrote:
>> >> >> >> >>
>> >> >> >> >> >I am a bit curious.
>> >> >> >> >> >0.94 doesn't have BucketCache.
>> >> >> >> >> >
>> >> >> >> >> >Can you share BucketCache related config
parameters in your
>> >> cluster
>> >> >> ?
>> >> >> >> >> >
>> >> >> >> >> >Cheers
>> >> >> >> >> >
>> >> >> >> >> >On Thu, Oct 15, 2015 at 11:11 AM, Randy Fox
<
>> rfox@connexity.com>
>> >> >> >> wrote:
>> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >> "StoreFileCloserThread-L-1" prio=10
tid=0x00000000027ec800
>> >> >> >> nid=0xad84
>> >> >> >> >> >> runnable [0x00007fbcc0c65000]
>> >> >> >> >> >>    java.lang.Thread.State: RUNNABLE
>> >> >> >> >> >>         at java.util.LinkedList.indexOf(LinkedList.java:602)
>> >> >> >> >> >>         at
>> java.util.LinkedList.contains(LinkedList.java:315)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$BucketSizeInfo.freeBlock(BucketAllocator.java:247)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.freeBlock(BucketAllocator.java:449)
>> >> >> >> >> >>         - locked <0x000000041b0887a8>
(a
>> >> >> >> >> >> org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlock(BucketCache.java:459)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.evictBlocksByHfileName(BucketCache.java:1036)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.evictBlocksByHfileName(CombinedBlockCache.java:90)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:516)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1143)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:503)
>> >> >> >> >> >>         - locked <0x00000004944ff2d8>
(a
>> >> >> >> >> >> org.apache.hadoop.hbase.regionserver.StoreFile)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:873)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> org.apache.hadoop.hbase.regionserver.HStore$2.call(HStore.java:870)
>> >> >> >> >> >>         at
>> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >> >> >> >>         at
>> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >> >> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> "StoreCloserThread-Wildfire_graph3,\x00\x04lK\x1B\xFC\x10\xD2,1402949830657.afb6a1720d936a83d73022aeb9ddbb6c.-1"
>> >> >> >> >> >> prio=10 tid=0x0000000003508800 nid=0xad83
waiting on
>> condition
>> >> >> >> >> >> [0x00007fbcc5dcc000]
>> >> >> >> >> >>    java.lang.Thread.State: WAITING (parking)
>> >> >> >> >> >>         at sun.misc.Unsafe.park(Native
Method)
>> >> >> >> >> >>         - parking to wait for  <0x0000000534e90a80>
(a
>> >> >> >> >> >>
>> >> >> >>
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:883)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> org.apache.hadoop.hbase.regionserver.HStore.close(HStore.java:126)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> >> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1378)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> >> org.apache.hadoop.hbase.regionserver.HRegion$2.call(HRegion.java:1375)
>> >> >> >> >> >>         at
>> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> >> >> >> >> >>         at
>> >> >> java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >> >> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> "RS_CLOSE_REGION-hb20:60020-0" prio=10
>> tid=0x00007fcec0142000
>> >> >> >> nid=0x3056
>> >> >> >> >> >> waiting on condition [0x00007fbcc2d87000]
>> >> >> >> >> >>    java.lang.Thread.State: WAITING (parking)
>> >> >> >> >> >>         at sun.misc.Unsafe.park(Native
Method)
>> >> >> >> >> >>         - parking to wait for  <0x0000000534e61360>
(a
>> >> >> >> >> >>
>> >> >> >>
>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> >> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1385)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1280)
>> >> >> >> >> >>         - locked <0x000000042230fa68>
(a java.lang.Object)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >>
>> >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> >> >> >> >>         at
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> >> >> >> >>         at java.lang.Thread.run(Thread.java:745)
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> I attached the whole thing as well.
>> >> >> >> >> >>
>> >> >> >> >> >> -r
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On 10/15/15, 10:39 AM, "Ted Yu" <yuzhihong@gmail.com>
>> wrote:
>> >> >> >> >> >>
>> >> >> >> >> >> >Can you give a bit more detail on
why block eviction was
>> cause
>> >> >> for
>> >> >> >> the
>> >> >> >> >> >> slow region movement?
>> >> >> >> >> >> >
>> >> >> >> >> >> >Did you happen to take stack traces
?
>> >> >> >> >> >> >
>> >> >> >> >> >> >Thanks
>> >> >> >> >> >> >
>> >> >> >> >> >> >> On Oct 15, 2015, at 10:32 AM,
Randy Fox <
>> rfox@connexity.com
>> >> >
>> >> >> >> wrote:
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> Hi,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> We just upgraded from 0.94
to 1.0.0 and have noticed that
>> >> >> region
>> >> >> >> >> moves
>> >> >> >> >> >> are super slow (order of minutes) whereas
previously they
>> >> where in
>> >> >> >> the
>> >> >> >> >> >> seconds range.  After looking at the
code, I think the time
>> is
>> >> >> spent
>> >> >> >> >> >> waiting for the blocks to be evicted
from block cache.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> I wanted to verify that this
theory is correct and see if
>> >> there
>> >> >> >> is
>> >> >> >> >> >> anything that can be done to speed up
the moves.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> This is particular painful
as we are trying to get our
>> >> configs
>> >> >> >> tuned
>> >> >> >> >> to
>> >> >> >> >> >> the new SW and need to do rolling restarts
which is taking
>> >> almost
>> >> >> 24
>> >> >> >> >> hours
>> >> >> >> >> >> on our cluster.  We also do our own
manual rebalancing of
>> >> regions
>> >> >> >> across
>> >> >> >> >> >> RS’s and that task is also now painful.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> Thanks,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >> Randy
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>>
Mime
View raw message