hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17361) HTable#getBufferedMutator is not thread safe and could cause data loss
Date Thu, 22 Dec 2016 18:53:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15770792#comment-15770792
] 

stack commented on HBASE-17361:
-------------------------------

Yes.

" * <p>This class is NOT thread safe for reads nor writes.
 * In the case of writes (Put, Delete), the underlying write buffer can
 * be corrupted if multiple threads contend over a single HTable instance.
 * In the case of reads, some fields used by a Scan are shared among all threads."

Then HBASE-14687 talks of BM being thread-safe.

"... the javadoc for BufferdMutator makes it sound like HTable where there should be one per
thread. That ends up being really really bad as there's one AsyncProcess per buffered mutator.
This can lead to a single regionserver being pounded. However if you don't use more than one
buffered mutator then the locking is extreme and contention eats up your cpu."

I do not know how to reconcile the two contending concerns above.

> HTable#getBufferedMutator is not thread safe and could cause data loss
> ----------------------------------------------------------------------
>
>                 Key: HBASE-17361
>                 URL: https://issues.apache.org/jira/browse/HBASE-17361
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.7, 1.2.4
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
>         Attachments: HBASE-17361.patch
>
>
> Now we have {{HTable#getBufferedMutator}} like
> {code}
>    BufferedMutator getBufferedMutator() throws IOException {
>      if (mutator == null) {
>       this.mutator = (BufferedMutatorImpl) connection.getBufferedMutator(
>           new BufferedMutatorParams(tableName)
>               .pool(pool)
>               .writeBufferSize(connConfiguration.getWriteBufferSize())
>               .maxKeyValueSize(connConfiguration.getMaxKeyValueSize())
>       );
>     }
>     mutator.setRpcTimeout(writeRpcTimeout);
>     mutator.setOperationTimeout(operationTimeout);
>     return mutator;
>   }
> {code}
> And {{HTable#flushCommits}}:
> {code}
>   void flushCommits() throws IOException {
>     if (mutator == null) {
>       // nothing to flush if there's no mutator; don't bother creating one.
>       return;
>     }
>     getBufferedMutator().flush();
>   }
> {code}
> For {{HTable#put}}
> {code}
>   public void put(final Put put) throws IOException {
>     getBufferedMutator().mutate(put);
>     flushCommits();
>   }
> {code}
> If we launch multiple threads to put in parallel, below sequence might happen because
{{HTable#getBufferedMutator}} is not thread safe:
> {noformat}
> 1. ThreadA runs to getBufferedMutator and finds mutator==null
> 2. ThreadB runs to getBufferedMutator and finds mutator==null
> 3. ThreadA initialize mutator to instanceA, then calls mutator#mutate,
> adding one put (putA) into {{writeAsyncBuffer}}
> 4. ThreadB initialize mutator to instanceB
> 5. ThreadA runs to flushCommits, now mutator is instanceB, it calls
> instanceB's flush method, putA is lost
> {noformat}
> Will add a UT to cover this case, and fix it in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message