hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
Date Wed, 29 Aug 2012 20:05:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444367#comment-13444367
] 

Jean-Daniel Cryans commented on HBASE-6550:
-------------------------------------------

This needs to be changed:

{code}
        new SynchronousQueue<Runnable>(), Threads.newDaemonThreadFactory("hbase-repl-pool"));
{code}

because it looks like:

{noformat}
"hbase-repl-poolpool-1-thread-227" daemon prio=10 tid=0x00007fd10cc73000 nid=0x6104 waiting
on condition [0x00007fd10a35d000]
{noformat}
                
> Refactoring ReplicationSink to make it more responsive of cluster health
> ------------------------------------------------------------------------
>
>                 Key: HBASE-6550
>                 URL: https://issues.apache.org/jira/browse/HBASE-6550
>             Project: HBase
>          Issue Type: New Feature
>          Components: replication
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550.patch, HBase-6550-v1.patch,
HBase-6550-v3.patch, HBase-6550-v4.patch
>
>
> ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client
to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting,
gc pause, etc) and it undergoes the retrial phase. 
> It has two repercussions:
> a) The regionserver handler which is serving the request (till now, a priority handler)
is blocked for this period.
> b) The caller may get timed out and it will retry it anyway, but the handler serving
the ReplicationSink requests is still working.
> Refactoring ReplicationSink to have the following features:
> a) Making it more configurable (have its own number of retrial limit, connection timeout,
etc)
> b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception
in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message