hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2738) FSEditLog.selectinputStreams is reading through in-progress streams even when non-in-progress are requested
Date Wed, 04 Jan 2012 01:30:39 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-2738:

    Priority: Blocker  (was: Critical)

Bumping to blocker since, in a busy cluster with lots of writes, this can prevent the standby
node from making any forward progress.
> FSEditLog.selectinputStreams is reading through in-progress streams even when non-in-progress
are requested
> -----------------------------------------------------------------------------------------------------------
>                 Key: HDFS-2738
>                 URL: https://issues.apache.org/jira/browse/HDFS-2738
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Aaron T. Myers
>            Priority: Blocker
> The new code in HDFS-1580 is causing an issue with selectInputStreams in the HA context.
When the active is writing to the shared edits, selectInputStreams is called on the standby.
This ends up calling {{journalSet.getInputStream}} but doesn't pass the {{inProgressOk=false}}
flag. So, {{getInputStream}} ends up reading and validating the in-progress stream unnecessarily.
Since the validation results are no longer properly cached, {{findMaxTransaction}} also re-validates
the in-progress stream, and then breaks the corruption check in this code. The end result
is a lot of errors like:
> 2011-12-30 16:45:02,521 ERROR namenode.FileJournalManager (FileJournalManager.java:getNumberOfTransactions(266))
- Gap in transactions, max txnid is 579, 0 txns from 578
> 2011-12-30 16:45:02,521 INFO  ha.EditLogTailer (EditLogTailer.java:run(163)) - Got error,
will try again.
> java.io.IOException: No non-corrupt logs for txid 578
> 	at org.apache.hadoop.hdfs.server.namenode.JournalSet.getInputStream(JournalSet.java:229)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1081)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:115)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$0(EditLogTailer.java:100)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:154)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message