hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15436) BufferedMutatorImpl.flush() appears to get stuck
Date Sun, 13 Mar 2016 16:28:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192411#comment-15192411
] 

Anoop Sam John commented on HBASE-15436:
----------------------------------------

{code}
"pool-14-thread-1" prio=10 tid=0x00007f4215268000 nid=0x46e6 waiting on condition [0x00007f41fe75d000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000eeb5a010> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
        at org.apache.hadoop.hbase.util.BoundedCompletionService.take(BoundedCompletionService.java:75)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:190)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
        at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109)
        at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
        at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
        at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
        at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
{code}

When I say the flush is continuing with each of the Mutation and you dont see, the thread
doing flush op doing nothing, u say it looks not. But the issue is the thread doing the flush
op works in a loop and that op in turn given a Meta table scan.  This u can see that the scan
op is given to another thread in a pool. The original flush thread is waiting for the completion
of that scan thread.  This u can clearly see in above trace.
So it is like this thread will wait for the result and that result is an Exception (SocketTimeout)
which it will see after mins. Then the flush thread again comes back to life and continue
that loop and again wil go into this wait mode..!!

> BufferedMutatorImpl.flush() appears to get stuck
> ------------------------------------------------
>
>                 Key: HBASE-15436
>                 URL: https://issues.apache.org/jira/browse/HBASE-15436
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.0.2
>            Reporter: Sangjin Lee
>         Attachments: hbaseException.log, threaddump.log
>
>
> We noticed an instance where the thread that was executing a flush ({{BufferedMutatorImpl.flush()}})
got stuck when the (local one-node) cluster shut down and was unable to get out of that stuck
state.
> The setup is a single node HBase cluster, and apparently the cluster went away when the
client was executing flush. The flush eventually logged a failure after 30+ minutes of retrying.
That is understandable.
> What is unexpected is that thread is stuck in this state (i.e. in the {{flush()}} call).
I would have expected the {{flush()}} call to return after the complete failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message