hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2223) Handle 10min+ network partitions between clusters
Date Fri, 21 May 2010 22:20:18 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jean-Daniel Cryans updated HBASE-2223:

    Attachment: HBASE-2223.patch

This is the patch I've been working on for a while now, almost all new code. It is not in
a committable state since there's a lot of debug information printed out, it needs an up-to-date
hadoop jar WRT HDFS-142 and HDFS-200, it's missing some more unit tests, code is commented
all over the place, etc.

It basically works like I described it in my Feb 18th comment, that is:

 - We keep track of the HLogs to replicate in Zookeeper in different folders for each region
servers. The oldest hlog's znode contains the position to seek to for the next batch to replicate.
 - The region servers tail their own HLogs in all situations, and listen to log rolling to
figure if a log is archived while it needs to be replicated. Since there's no real tailing
in HDFS, we have to reopen and seek every time and this hits the Namenode. So, in order to
not DDOS it, we wait a second by default when no data is available for replication. Each time
we hit an EOF, we wait a second more than the last time up to 10 seconds (so SLA here is it
takes as most as 10 seconds + time to apply data on slave cluster for the data to be available
on the other end). The same kind of waiting happens when region servers on a slave cluster
aren't reachable.
 - When a region server fails on the master side, its logs queue is taken over in Zookeeper
by another RS during a race for a lock. This can failover as many times as we want e.g. a
RS could end up finishing the replication for a queue that was passed on 10 times. Also it
is important to note that a failover'ed queue will be processed in parallel. This means that
if you have only 1 slave cluster and a RS died, the one that takes over the queue will send
edits to the slave cluster in 2 different threads. When the failover'ed queue is done with,
that replication stream is closed.
 - The sink side of the region server still works like in the original implementation, it
has to be either changed to work like ReplicationSource or remove the part where we log to
a file.

I'd be happy to sit down with other people to review the patch.

> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>         Attachments: HBASE-2223.patch
> We need a nice way of handling long network partitions without impacting a master cluster
(which pushes the data). Currently it will just retry over and over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or just the
first 2 parts. Discuss.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message