hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters
Date Fri, 28 May 2010 19:04:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873116#action_12873116

HBase Review Board commented on HBASE-2223:

Message from: "Jean-Daniel Cryans" <jdcryans@apache.org>

This is an automatically generated e-mail. To reply, visit:

(Updated 2010-05-28 12:03:35.298094)

Review request for hbase.


New patch that takes care of almost all Benoit's comments. My current TODO is:

- use a <pre> block to make this more readable in the HTML version of the javadoc.

- I think it would be good to document the fact that this method will typically be called
from another thread than the thread that executes `run' so that other people reading the code
will quickly get a good grasp of what are the concurrency / locking requirements.

- So Delete operations are "unbuffered" unlike Put operations, which you "buffer" in the `puts'
list.  Does that mean that a Delete can be executed before the Put that was creating the data
in the first place, and that the Delete will fail first and the Put will survive second?

// Should we log rejected edits in a file for replay?
- I vote yes

- This `try' block is massive, would it be possible to refactor it using a private method
to make the code a bit more readable?


This is HBASE-2223 AKA Replication 2.0, it is currently only a "preview patch" as it's pretty
much feature complete, works on a cluster, has unit tests and whatnot, but it could use a
lot more testing and cleaning and ideas from others.

This addresses bug HBASE-2223.

Diffs (updated)

  src/main/java/org/apache/hadoop/hbase/HConstants.java 13aff26 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
  src/main/java/org/apache/hadoop/hbase/master/ServerManager.java a197b8f 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java b5ff43a 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 12a3cd8 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 7c1184c 
  src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/package.html PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java PRE-CREATION

  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
  src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java PRE-CREATION

  src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java PRE-CREATION


Diff: http://review.hbase.org/r/76/diff




> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>         Attachments: HBASE-2223.patch
> We need a nice way of handling long network partitions without impacting a master cluster
(which pushes the data). Currently it will just retry over and over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or just the
first 2 parts. Discuss.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message