hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12565) Race condition in HRegion.batchMutate() causes partial data to be written when region closes
Date Fri, 05 Dec 2014 19:58:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236001#comment-14236001
] 

Hudson commented on HBASE-12565:
--------------------------------

SUCCESS: Integrated in HBase-1.0 #545 (See [https://builds.apache.org/job/HBase-1.0/545/])
HBASE-12565 Race condition in HRegion.batchMutate() causes partial data to be written when
region closes (stack: rev 1bd27bfa240e10cd2c33ef007ac9b8ab4014039f)
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestAtomicOperation.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


> Race condition in HRegion.batchMutate()  causes partial data to be written when region
closes
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-12565
>                 URL: https://issues.apache.org/jira/browse/HBASE-12565
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance, regionserver
>    Affects Versions: 2.0.0, 0.98.6
>            Reporter: Scott Fines
>             Fix For: 1.0.0, 2.0.0
>
>         Attachments: hbase-12565-v1.patch, hbase-12565-v1.patch, hbase-12565.patch
>
>
> The following sequence of events is possible to occur in HRegion's batchMutate() call:
> 1. caller attempts to call HRegion.batchMutate() with a batch of N>1 records
> 2. batchMutate acquires region lock in startRegionOperation, then calls doMiniBatchMutation()
> 3. doMiniBatchMutation acquires one row lock
> 4. Region closes
> 5. doMiniBatchMutation attempts to acquire second row lock.
> When this happens, the lock acquisition will also attempt to acquire the region lock,
which fails (because the region is closing). At this stage, doMiniBatchMutation will stop
writing further, BUT it WILL write data for the rows whose locks have already been acquired,
and advance the index in MiniBatchOperationInProgress. Then, after it terminates successfully,
batchMutate() will loop around a second time, and attempt AGAIN to acquire the region closing
lock. When that happens, a NotServingRegionException is thrown back to the caller.
> Thus, we have a race condition where partial data can be written when a region server
is closing.
> The main problem stems from the location of startRegionOperation() calls in batchMutate
and doMiniBatchMutation():
> 1. batchMutate() reacquires the region lock with each iteration of the loop, which can
cause some successful writes to occur, but then fail on others
> 2. getRowLock() attempts to acquire the region lock once for each row, which allows doMiniBatchMutation
to terminate early; this forces batchMutate() to use multiple iterations and results in condition
1 being hit.
> There appears to be two parts to the solution as well:
> 1. open an internal path so that doMiniBatchMutation() can acquire row locks without
checking for region closure. This will have the added benefit of a significant performance
improvement during large batch mutations.
> 2. move the startRegionOperation() out of the loop in batchMutate() so that multiple
iterations of doMiniBatchMutation will not cause the operation to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message