phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent Poon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-3111) Possible Deadlock/delay while building index, upsert select, delete rows at server
Date Thu, 14 Sep 2017 22:36:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vincent Poon updated PHOENIX-3111:
----------------------------------
    Attachment: splitDuringUpsertSelect.patch

Here's a patch that has following tests to reproduce the issue, they're part of a larger suite
I was writing to test global mutable secondary indexing:
testRegionCloseDuringUpsertSelect
testSplitDuringUpsertSelect

Patch includes something like what [~samarthjain] suggested - when a split/close has been
requested, I throw IOException for any new incoming scans that require a write.  This at least
allows the split/close to happen eventually, as the scansRefCounter won't go up, while still
allowing for the existing operations in progress to finish.

The loop in preClose is rather dangerous, since if a scanner thread is interrupted, there
is no guarantee the finally block will run, and so the scanRefCounter might never get back
to 0.  I encountered this in the test when the miniCluster was attempting to shutdown, not
sure if there are other scenarios where this might happen in actual production usage.  I throw
an IOException during an interrupt there to avoid this.

Note that you currently can't run all the tests in the suite, so just run the individual tests
you want to try.

> Possible Deadlock/delay while building index, upsert select, delete rows at server
> ----------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3111
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3111
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sergio Peleato
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-3111_addendum.patch, PHOENIX-3111.patch, PHOENIX-3111_v2.patch,
splitDuringUpsertSelect.patch
>
>
> There is a possible deadlock while building local index or running upsert select, delete
at server. The situation might happen in this case.
> In the above queries we scan mutations from table and write back to same table in that
case there is a chance of memstore might reach the threshold of blocking memstore size then
RegionTooBusyException might be thrown back to client and queries might retry scanning.
> Let's suppose if we take a local index build index case we first scan from the data table
and prepare index mutations and write back to same table.
> So there is chance of memstore full as well in that case we try to flush the region.
But if the split happen in between then split might be waiting for write lock on the region
to close and flush wait for readlock because the write lock in the queue until the local index
build completed. Local index build won't complete because we are not allowed to write until
there is flush. This might not be complete deadlock situation but the queries might take lot
of time to complete in this cases.
> {noformat}
> "regionserver//192.168.0.53:16201-splits-1469165876186" #269 prio=5 os_prio=31 tid=0x00007f7fb2050800
nid=0x1c033 waiting on condition [0x0000000139b68000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006ede72550> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>         at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1422)
>         at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1370)
>         - locked <0x00000006ede69d00> (a java.lang.Object)
>         at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:394)
>         at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
>         at org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:561)
>         at org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>         at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>         - <0x00000006ee132098> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> {noformat}
> "MemStoreFlusher.0" #170 prio=5 os_prio=31 tid=0x00007f7fb6842000 nid=0x19303 waiting
on condition [0x00000001388e9000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000006ede72550> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>         at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>         at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1986)
>         at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1950)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:501)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75)
>         at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> As a fix we need to block region splits if building index, upsert select, delete rows
running at server.
> Thanks [~sergey.soldatov] for the help in understanding the bug and analyzing it. [~speleato]
for finding it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message