hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14953) HBaseInterClusterReplicationEndpoint: Do not retry the whole batch of edits in case of RejectedExecutionException
Date Thu, 10 Dec 2015 06:51:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050198#comment-15050198
] 

Lars Hofhansl commented on HBASE-14953:
---------------------------------------

Interesting, didn't think of that case. Amazing how many problems a little change like this
can cause.

Why not add a real queue (i.e. not synchronous queue)? (In that case we need to set coreThreads
to maxThreads as well, and allow core threads to time out)

Since we're waiting on the futures to finish anyway, as they sit in the queue we'd naturally
wait exactly the right amount of time, so the queue can be unbounded - eventually we'd have
all workers waiting, which is what we want.


> HBaseInterClusterReplicationEndpoint: Do not retry the whole batch of edits in case of
RejectedExecutionException
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14953
>                 URL: https://issues.apache.org/jira/browse/HBASE-14953
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 2.0.0, 1.2.0, 1.3.0
>            Reporter: Ashu Pachauri
>            Assignee: Ashu Pachauri
>            Priority: Critical
>         Attachments: HBASE-14953-V1.patch
>
>
> When we have wal provider set to multiwal, the ReplicationSource has multiple worker
threads submitting batches to HBaseInterClusterReplicationEndpoint. In such a scenario, it
is quite common to encounter RejectedExecutionException because it takes quite long for shipping
edits to peer cluster compared to reading edits from source and submitting more batches to
the endpoint. 
> The logs are just filled with warnings due to this very exception.
> Since we subdivide batches before actually shipping them, we don't need to fail and resend
the whole batch if one of the sub-batches fails with RejectedExecutionException. Rather, we
should just retry the failed sub-batches. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message