hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Virag Kothari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
Date Wed, 01 Oct 2014 06:35:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154441#comment-14154441

Virag Kothari commented on HBASE-12125:

A WAL roll on region server would be required only if the current WAL (WAL being written to)
is corrupted. So fixCorruptedReplicationWAL can be useful if we know that the current WAL
being written to is ok.

> Add Hbck option to check and fix WAL's from replication queue
> -------------------------------------------------------------
>                 Key: HBASE-12125
>                 URL: https://issues.apache.org/jira/browse/HBASE-12125
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
> The replication source will discard the WAL file in many cases when it encounters an
exception reading it . This can cause data loss
> and the underlying reason of failed read remains hidden.  Only in certain scenarios,
the replication source should dump the current WAL and move to the next one. 
> This JIRA aims to have an hbck option to check the WAL files of replication queues for
any inconsistencies and also provide an option to fix it.
> The fix can be to remove the file from replication queue in zk and from the memory of
replication source manager and replication sources. 
> A region server endpoint call from the hbck client to region server can be used to achieve
> Hbck can be configured with the following options:
> -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently read
by replication source) from replication queue. If there is a position associated, it also
seeks to that position and reads an entry from there
> -hardCheckReplicationWAL:  Check all WAL paths from replication queues by reading them
completely to make sure they are ok.
> -fixMissingReplicationWAL: Remove the WAL's from replication queues which are not present
on hdfs
> -fixCorruptedReplicationWAL:  Remove the WAL's from replication queues which are corrupted
(based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine
> -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled
over and then deals with it in the same way as -fixCorruptedReplicationWAL option

This message was sent by Atlassian JIRA

View raw message