hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1220) Namenode unable to start due to truncated fstime
Date Fri, 10 Sep 2010 06:35:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907903#action_12907903

dhruba borthakur commented on HDFS-1220:

The rename does not actually sync all the data from the kernel buffers to disk. Thus, it is
theoretically possible that  even though the NN actually wrote out everything to disk and
the machine rebooted, some data in any of the fstime/edits/fsimage could be missing. I think
we should issue a fsync() on all these files before closing them.

> Namenode unable to start due to truncated fstime
> ------------------------------------------------
>                 Key: HDFS-1220
>                 URL: https://issues.apache.org/jira/browse/HDFS-1220
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Thanh Do
> - Summary: updating fstime file on disk is not atomic, so it is possible that
> if a crash happens in the middle, next time when NameNode reboots, it will
> read stale fstime, hence unable to start successfully.
> - Details:
> Basically, this involve 3 steps:
> 1) delete fstime file (timeFile.delete())
> 2) truncate fstime file (new FileOutputStream(timeFile))
> 3) write new time to fstime file (out.writeLong(checkpointTime))
> If a crash happens after step 2 and before step 3, in the next reboot, NameNode
> got an exception when reading the time (8 byte) from an empty fstime file.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and 
> Haryadi Gunawi (haryadi@eecs.berkeley.edu

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message