hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2223) Handle 10min+ network partitions between clusters
Date Mon, 14 Jun 2010 19:09:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878703#action_12878703
] 

HBase Review Board commented on HBASE-2223:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jdcryans@apache.org>


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java,
line 56
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1114#file1114line56>
bq.  >
bq.  >     For sure setConf will have been called before we get here?  So, stuff gets setup
by setConf?  Can setConf be called more than once?  How do I know how to use this class? 
Not doc'd.  Doesn't have a Constructor.

LogCleanerDelegate is the interface that defines the general behavior. Yes should have a constructor.


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java,
line 111
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1114#file1114line111>
bq.  >
bq.  >     The way this is done, if I didn't want to wait on the ttl, then I'd have to
write a new class.  Can't we have ttl and recplication be distinct and then if I want delete
based off ttl and whether log up in zk, then chain them?

I don't follow, chaining is already how I do it.


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java,
line 54
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1116#file1116line54>
bq.  >
bq.  >     I dont follow?

Yeah, RepSink is a mix of 2 solutions but only features the worst of both. The next patch
will significantly make it better.


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java,
line 126
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1117#file1117line126>
bq.  >
bq.  >     This ain't a constructor?

I ain't.. but it's used like one.


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java,
line 483
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1117#file1117line483>
bq.  >
bq.  >     We have to copy?

This is the down side of the way of I'm caping the log entries by size or number. I'm reusing
the same HLog.Entry[] entriesArray to read from HLogs (and the entries in it). For example,
replicationQueueSizeCapacity=64MB and replicationQueueNbCapacity=25k. Let's say on a first
run we reach 25k without reaching the size, so we'll replicate the whole array. Now on the
second run let's say we reached 64MB after only 10k rows, then we only want to replicate that
and not the 15k "leftovers".


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java, line
67
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1123#file1123line67>
bq.  >
bq.  >     No dfs in this test.  Thats intentional?

Nope, should fix.


bq.  On 2010-06-11 15:31:37, stack wrote:
bq.  > src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java,
line 86
bq.  > <http://review.hbase.org/r/76/diff/5/?file=1124#file1124line86>
bq.  >
bq.  >     Can't you squash some of these tests together?  They each start up own minidfscluster...
just start it once?

They don't?

  @Before
  public void setUp() throws Exception {
    table1 = TEST_UTIL.truncateTable(TABLE_NAME1);
    table2 = TEST_UTIL.truncateTable(TABLE_NAME2);
    Thread.sleep(SLEEP_TIME);
  }


- Jean-Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/#review194
-----------------------------------------------------------





> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2223.patch
>
>
> We need a nice way of handling long network partitions without impacting a master cluster
(which pushes the data). Currently it will just retry over and over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or just the
first 2 parts. Discuss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message