Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4D802200B66 for ; Wed, 3 Aug 2016 09:13:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4C25E160A64; Wed, 3 Aug 2016 07:13:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6A006160AB0 for ; Wed, 3 Aug 2016 09:13:22 +0200 (CEST) Received: (qmail 42179 invoked by uid 500); 3 Aug 2016 07:13:21 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 41802 invoked by uid 99); 3 Aug 2016 07:13:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2016 07:13:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 3C5B12C0D61 for ; Wed, 3 Aug 2016 07:13:21 +0000 (UTC) Date: Wed, 3 Aug 2016 07:13:21 +0000 (UTC) From: "Hudson (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 03 Aug 2016 07:13:23 -0000 [ https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405475#comment-15405475 ] Hudson commented on PHOENIX-3111: --------------------------------- SUCCESS: Integrated in Phoenix-master #1353 (See [https://builds.apache.org/job/Phoenix-master/1353/]) PHOENIX-3111 Possible Deadlock/delay while building index, upsert (rajeshbabu: rev 27c4027fd72cec790975c810724f3a778388e426) * phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java > Possible Deadlock/delay while building index, upsert select, delete rows at server > ---------------------------------------------------------------------------------- > > Key: PHOENIX-3111 > URL: https://issues.apache.org/jira/browse/PHOENIX-3111 > Project: Phoenix > Issue Type: Bug > Reporter: Sergio Peleato > Assignee: Rajeshbabu Chintaguntla > Priority: Critical > Fix For: 4.8.0 > > Attachments: PHOENIX-3111.patch, PHOENIX-3111_addendum.patch, PHOENIX-3111_v2.patch > > > There is a possible deadlock while building local index or running upsert select, delete at server. The situation might happen in this case. > In the above queries we scan mutations from table and write back to same table in that case there is a chance of memstore might reach the threshold of blocking memstore size then RegionTooBusyException might be thrown back to client and queries might retry scanning. > Let's suppose if we take a local index build index case we first scan from the data table and prepare index mutations and write back to same table. > So there is chance of memstore full as well in that case we try to flush the region. But if the split happen in between then split might be waiting for write lock on the region to close and flush wait for readlock because the write lock in the queue until the local index build completed. Local index build won't complete because we are not allowed to write until there is flush. This might not be complete deadlock situation but the queries might take lot of time to complete in this cases. > {noformat} > "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 os_prio=31 tid=0x00007f7fb2050800 nid=0x1c033 waiting on condition [0x0000000139b68000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006ede72550> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370) > - locked <0x00000006ede69d00> (a java.lang.Object) > at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394) > at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278) > at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561) > at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82) > at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - <0x00000006ee132098> (a java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > {noformat} > "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303 waiting on condition [0x00000001388e9000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006ede72550> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986) > at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:745) > {noformat} > As a fix we need to block region splits if building index, upsert select, delete rows running at server. > Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing it. [~speleato] for finding it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)