hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6719) [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
Date Thu, 13 Sep 2012 17:28:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455027#comment-13455027
] 

Jean-Daniel Cryans commented on HBASE-6719:
-------------------------------------------

bq. Can we rewrite the patch this way?

Yeah I think this works.

bq. One concern I have: What if the file is actually gone for some reason? In that case it
seems we'd never stop retrying.

If you go up in the file you'll see that after we looked everywhere I currently don't have
a good solution for files that are missing completely. Basically my heuristic was "if I can't
open or get to the file and there's another one that's available, I'll dump it". This indeed
doesn't work if there's a transient error that lasts long enough for the retries to exhaust.

Should we introduce a quarantine?
                
> [replication] Data will lose if open a Hlog failed more than maxRetriesMultiplier
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-6719
>                 URL: https://issues.apache.org/jira/browse/HBASE-6719
>             Project: HBase
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.94.1
>            Reporter: terry zhang
>            Assignee: terry zhang
>            Priority: Critical
>             Fix For: 0.94.3
>
>         Attachments: 6719.txt, hbase-6719.patch
>
>
> Please Take a look below code
> {code:title=ReplicationSource.java|borderStyle=solid}
> protected boolean openReader(int sleepMultiplier) {
> {
>   ...
>   catch (IOException ioe) {
>       LOG.warn(peerClusterZnode + " Got: ", ioe);
>       // TODO Need a better way to determinate if a file is really gone but
>       // TODO without scanning all logs dir
>       if (sleepMultiplier == this.maxRetriesMultiplier) {
>         LOG.warn("Waited too long for this file, considering dumping");
>         return !processEndOfFile(); // Open a file failed over maxRetriesMultiplier(default
10)
>       }
>     }
>     return true;
>   ...
> }
>   protected boolean processEndOfFile() {
>     if (this.queue.size() != 0) {    // Skipped this Hlog . Data loss
>       this.currentPath = null;
>       this.position = 0;
>       return true;
>     } else if (this.queueRecovered) {   // Terminate Failover Replication source thread
,data loss
>       this.manager.closeRecoveredQueue(this);
>       LOG.info("Finished recovering the queue");
>       this.running = false;
>       return true;
>     }
>     return false;
>   }
> {code} 
> Some Time HDFS will meet some problem but actually Hlog file is OK , So after HDFS back
 ,Some data will lose and can not find them back in slave cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message