hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14004) [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin
Date Sat, 05 Dec 2015 06:24:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042694#comment-15042694

Phil Yang commented on HBASE-14004:

After discussion in HBASE-14790 , we can move forward now. Let me repost my comment in HBASE-14790
first :)
Currently there are two scenarios which may result in inconsistency between two clusters.

The first is master cluster crashes(for example, power failure) or three DNs and RS crash
at the same time and we lost all data that is not flushed to DNs' disks but the data have
been already synced to slave cluster.

The second is we will rollback memstore and response client an error if we get a error on
hflush but the log may indeed exists in WAL. This will not only results in inconsistency between
two clusters but also gives client a wrong response because the data will "revive" after replaying
WAL. This scenario has been discussed in HBASE-14004

Comparing to the second, it is easier to solve the first scenario that we can tell ReplicationSource
it can only read the logs that is already saved on three disks. We need to know the largest
WAL entry id that has been synced. So HDFS's sync logic for itself may be not helpful for
us and we must use hsync to let HBase know the entry id. So we need a configurable periodically
hsync here, and if we have only one cluster it is also helpful to reduce data losses because
of data center power failure or unluckily crashing three DNs and RS at the same time.

For the second scenario, it is more complex because we can not rollback memstore and tell
client this operation failed unless we are very sure the data will never exist in WAL, and
mostly we are not sure... So we have to use a new WAL logic that rewriting the entry to the
new file rather than rollback. To implement this we need to handle duplicate entries while
replaying WAL.

Therefore, we may have 4 subtasks:
1: A configurable periodically hsync logic to make sure our data has been saved on disks.
It is also helpful for single cluster mode.
2: ReplicationSource should only read WAL that is hsynced to prevent slave cluster having
data that master losses.
3: WAL reader can handle duplicate entries, in other words, make WAL logging idempotent. 
4: Fixing HBase writing path that we should retry logging WAL in a new file rather than rollback


> [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster
that is not in the origin
> -------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-14004
>                 URL: https://issues.apache.org/jira/browse/HBASE-14004
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: He Liangliang
>            Priority: Critical
>              Labels: replication, wal
> Looks like the current write path can cause inconsistency between memstore/hfile and
WAL which cause the slave cluster has more data than the master cluster.
> The simplified write path looks like:
> 1. insert record into Memstore
> 2. write record to WAL
> 3. sync WAL
> 4. rollback Memstore if 3 fails
> It's possible that the HDFS sync RPC call fails, but the data is already  (may partially)
transported to the DNs which finally get persisted. As a result, the handler will rollback
the Memstore and the later flushed HFile will also skip this record.

This message was sent by Atlassian JIRA

View raw message