hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13121) Async wal replication for region replicas and dist log replay does not work together
Date Tue, 10 Mar 2015 01:03:29 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354013#comment-14354013

Hudson commented on HBASE-13121:

SUCCESS: Integrated in HBase-1.1 #264 (See [https://builds.apache.org/job/HBase-1.1/264/])
HBASE-13121 Async wal replication for region replicas and dist log replay does not work together
(enis: rev 280120ee1593ed65d288bfb1169150ee9e73a33f)
* hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoveringRegionWatcher.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALPrettyPrinter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/FinishRegionRecoveringHandler.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestRegionReplicaReplicationEndpointNoMaster.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicaFailover.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RegionReplicaReplicationEndpoint.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/replication/BaseWALEntryFilter.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java

> Async wal replication for region replicas and dist log replay does not work together
> ------------------------------------------------------------------------------------
>                 Key: HBASE-13121
>                 URL: https://issues.apache.org/jira/browse/HBASE-13121
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.1.0
>         Attachments: hbase-13121_v1.patch, hbase-13121_v2.patch
> We had not tested dist log replay while testing async wal replication for region replicas.
There seems to be a couple of issues, but fixable. 
> The distinction for dist log replay is that, the region will be opened for recovery and
regular writes when a primary fails over. This causes the region open event marker to be written
to WAL, but at this time, the region actually does not contain all the edits flushed (since
it is still recovering). If secondary regions see this event, and picks up all the files in
the region open event marker, then they can drop edits. 
> The solution is: 
>  - Only write the region open event marker to WAL when region is out of recovering mode.

>  - Force a flush out of recovering mode. This ensures that all data is force flushed
in this case. Before the region open event marker is written, we guarantee that all data in
the region is flushed, so the list of files in the event marker is complete.  
>  - Edits coming from recovery are re-written to WAL when recovery is in action. These
edits will have a larger seqId then their "original" seqId. If this is the case, we do not
replicate these edits to the secondary replicas. Since the dist log replay recovers edits
out of order (coming from parallel replays from WAL file split tasks), this ensures that TIMELINE
consistency is respected and edits are not seen out of order in secondaries. These edits are
seen from secondaries via the forced flush event.

This message was sent by Atlassian JIRA

View raw message