hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Virag Kothari (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
Date Tue, 30 Sep 2014 22:11:34 GMT
Virag Kothari created HBASE-12125:

             Summary: Add Hbck option to check and fix WAL's from replication queue
                 Key: HBASE-12125
                 URL: https://issues.apache.org/jira/browse/HBASE-12125
             Project: HBase
          Issue Type: Bug
          Components: Replication
            Reporter: Virag Kothari
            Assignee: Virag Kothari

The replication source will discard the WAL file in many cases when it encounters an exception
reading it . This can cause data loss
and the underlying reason of failed read remains hidden.  Only in certain scenarios, the replication
source should dump the current WAL and move to the next one. 
This JIRA aims to have an hbck option to check the WAL files of replication queues for any
inconsistencies and also provide an option to fix it.
The fix can be to remove the file from replication queue in zk and from the memory of replication
source manager and replication sources. 
A region server endpoint call from the hbck client to region server can be used to achieve

Hbck can be configured with the following options:

-softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently read by replication
source) from replication queue. If there is a position associated, it also seeks to that position
and reads an entry from there
-hardCheckReplicationWAL:  Check all WAL paths from replication queues by reading them completely
to make sure they are ok.
-fixMissingReplicationWAL: Remove the WAL's from replication queues which are not present
on hdfs
-fixCorruptedReplicationWAL:  Remove the WAL's from replication queues which are corrupted
(based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine
-rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled over
and then deals with it in the same way as -fixCorruptedReplicationWAL option

This message was sent by Atlassian JIRA

View raw message