hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters
Date Fri, 28 May 2010 19:04:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873116#action_12873116
] 

HBase Review Board commented on HBASE-2223:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jdcryans@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/
-----------------------------------------------------------

(Updated 2010-05-28 12:03:35.298094)


Review request for hbase.


Changes
-------

New patch that takes care of almost all Benoit's comments. My current TODO is:


ReplicationZookeeperHelper.java
- use a <pre> block to make this more readable in the HTML version of the javadoc.

ReplicationSink.java
- I think it would be good to document the fact that this method will typically be called
from another thread than the thread that executes `run' so that other people reading the code
will quickly get a good grasp of what are the concurrency / locking requirements.

- So Delete operations are "unbuffered" unlike Put operations, which you "buffer" in the `puts'
list.  Does that mean that a Delete can be executed before the Put that was creating the data
in the first place, and that the Delete will fail first and the Put will survive second?

// Should we log rejected edits in a file for replay?
- I vote yes

ReplicationSource.java
- This `try' block is massive, would it be possible to refactor it using a private method
to make the code a bit more readable?


Summary
-------

This is HBASE-2223 AKA Replication 2.0, it is currently only a "preview patch" as it's pretty
much feature complete, works on a cluster, has unit tests and whatnot, but it could use a
lot more testing and cleaning and ideas from others.


This addresses bug HBASE-2223.
    http://issues.apache.org/jira/browse/HBASE-2223


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/HConstants.java 13aff26 
  src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
  src/main/java/org/apache/hadoop/hbase/master/ServerManager.java a197b8f 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java b5ff43a 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 12a3cd8 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 7c1184c 
  src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperHelper.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/package.html PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java PRE-CREATION

  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
  src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java PRE-CREATION

  src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java PRE-CREATION

  src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java
PRE-CREATION 
  src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
PRE-CREATION 

Diff: http://review.hbase.org/r/76/diff


Testing
-------


Thanks,

Jean-Daniel




> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2223.patch
>
>
> We need a nice way of handling long network partitions without impacting a master cluster
(which pushes the data). Currently it will just retry over and over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or just the
first 2 parts. Discuss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message