hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15001) Thread Safety issues in ReplicationSinkManager and HBaseInterClusterReplicationEndpoint
Date Mon, 21 Dec 2015 05:06:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066051#comment-15066051
] 

Ted Yu commented on HBASE-15001:
--------------------------------

I went through the patch - it looks good.

Jenkins builds are currently failing, due to some other JIRA.

I hope we follow the procedure of submitting for QA run, validation, review and then commit
in the future.

> Thread Safety issues in ReplicationSinkManager and HBaseInterClusterReplicationEndpoint
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-15001
>                 URL: https://issues.apache.org/jira/browse/HBASE-15001
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0, 1.2.0, 1.3.0, 1.2.1
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: HBASE-15001-V0.patch, Test.java, repro_stuck_replication.diff
>
>
> ReplicationSinkManager is not thread-safe. This can cause problems in HBaseInterClusterReplicationEndpoint,
 when the walprovider is multiwal. 
> For example: 
> 1. When multiple threads report bad sinks, the sink list can be non-empty but report
a negative size because the ArrayList itself is not thread-safe. 
> 2. HBaseInterClusterReplicationEndpoint depends on the number of sinks to batch edits
for shipping. However, it's quite possible that the following code makes it assume that there
are no batches to process (sink size is non-zero, but by the time we reach the "batching"
part, sink size becomes zero.)
> {code}
> if (replicationSinkMgr.getSinks().size() == 0) {
>     return false;
> }
> ...
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>                replicationSinkMgr.getSinks().size());
> {code}
> [Update] This leads to ArithmeticException: division by zero at:
> {code}
> entryLists.get(Math.abs(Bytes.hashCode(e.getKey().getEncodedRegionName())%n)).add(e);
> {code}
> which is benign and will just lead to retries by the ReplicationSource.
> The idea is to make all operations in ReplicationSinkManager thread-safe and do a verification
on the size of replicated edits before we report success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message