hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16429) FSHLog: deadlock if rollWriter called when ring buffer filled with appends
Date Thu, 18 Aug 2016 00:12:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425636#comment-15425636
] 

Hadoop QA commented on HBASE-16429:
-----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 59s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} |
{color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} |
{color:green} master passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} |
{color:green} master passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} |
{color:green} master passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} |
{color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} |
{color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 27m 21s {color}
| {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2
2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 15s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} |
{color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} |
{color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 100m 26s {color} | {color:green}
hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 143m 12s {color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:date2016-08-17 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12824065/HBASE-16429.patch
|
| JIRA Issue | HBASE-16429 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux 2941c1a670b5 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / e637a61 |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101
|
| findbugs | v3.0.0 |
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/3133/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/3133/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> FSHLog: deadlock if rollWriter called when ring buffer filled with appends
> --------------------------------------------------------------------------
>
>                 Key: HBASE-16429
>                 URL: https://issues.apache.org/jira/browse/HBASE-16429
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.2, 1.2.2
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
>         Attachments: HBASE-16429.patch
>
>
> Recently we experienced an online problem that all handlers are stuck. Checking the jstack
we could see all handler threads waiting for RingBuffer.next, while the single ring buffer
consumer dead waiting for {{safePointReleasedLatch}} to count down:
> {noformat}
> Normal handler thread:
> "B.defaultRpcServer.handler=126,queue=9,port=16020" daemon prio=10 tid=0x00007efd4b44f800
nid=0x15f29 runnable [0x00007efd3db7b000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:349)
>         at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
>         at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
>         at com.lmax.disruptor.RingBuffer.next(RingBuffer.java:246)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1222)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3188)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2879)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:736)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:698)
>         at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2095)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:774)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:102)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>         at java.lang.Thread.run(Thread.java:756)
> RingBufferEventHandler thread waiting for safePointReleasedLatch:
> "regionserver/hadoop0369.et2.tbsite.net/11.251.152.226:16020.append-pool2-t1" prio=10
tid=0x00007efd320d0000 nid=0x1777b waiting on condition [0x00007efd2d2fa000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007f01b48d9178> (a java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.safePointAttained(FSHLog.java:1866)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.attainSafePoint(FSHLog.java:2066)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:2029)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1909)
>         at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:756)
> {noformat} 
> {{FSHLog#replaceWriter}} will call {{SafePointZigZagLatch#releaseSafePoint}} to count
down {{safePointReleasedLatch}}, but replaceWriter got stuck when trying to publish a sync
onto ring buffer:
> {noformat}
> "regionserver/hadoop0369.et2.tbsite.net/11.251.152.226:16020.logRoller" daemon prio=10
tid=0x00007efd320c8800 nid=0x16123 runnable [0x00007efd311f6000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:349)
>         at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
>         at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
>         at com.lmax.disruptor.RingBuffer.next(RingBuffer.java:246)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncOnRingBuffer(FSHLog.java:1481)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncOnRingBuffer(FSHLog.java:1477)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog.replaceWriter(FSHLog.java:957)
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:726)
>         at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:148)
>         at java.lang.Thread.run(Thread.java:756)
> {noformat}
> Thus deadlock happens.
> A brief process of how deadlock forms:
> {noformat}
> ring buffer filled with appends
> -> rollWriter happens
> -> the only consumer of ring buffer waiting for safePointReleasedLatch
> -> rollWriter cannot publish sync since ring buffer is full
> -> rollWriter won't release safePointReleasedLatch
> {noformat}
> This JIRA targeting at resolve this issue, and will add a UT to cover the case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message