hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-760) HDFS edits log file corrupted can lead to a major loss of data.
Date Wed, 29 Nov 2006 18:12:24 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-760?page=all ]

Yoram Arnon resolved HADOOP-760.

    Resolution: Duplicate

this is a duplicate of HADOOP-227, which requests periodic checkpointing (and starting a fresh
edits file) of the namenode image

> HDFS edits log file corrupted can lead to a major loss of data.
> ---------------------------------------------------------------
>                 Key: HADOOP-760
>                 URL: http://issues.apache.org/jira/browse/HADOOP-760
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.6.1
>            Reporter: Philippe Gassmann
>            Priority: Critical
> In one of our test system, our HDFS gets corrupted after the edits log file has been
corrupted (i can tell how).
> When we restarted the HDFS, the namenode refusses to started with a exception in hadoop-namenode-xxx.out.
> Unfortunately, a rm mistake has been done, and I was not able to save somewhere this
> But it was an ArrayIndexOutOfBoundException somewhere in a UTF8 method called from FSEditLog.loadFSEdits.
> The result : the namenode was unable to start, the only way to get it fixed was the removing
of the edits log file.
> As it was on a test machine we do not have any backup, so all files created in the hdfs
since the last start of the namenode were lost.
> Is there a way to periodically commit changes to the hdfs in fsimage instead of keeping
a huge logfile ? (eg every 10 minutes or so.)
> Even if the namenode files are rsync'ed, what can be done in that particular case ? (if
we periodically rsync the fsimage and its corrupted edits file).
> This issue affects the 0.6.1 HDFS version. After looking at the hadoop trunk code, I
am not able to says if this can be happening anymore... (I would say yes because of the use
of UTF8 class in the same way as in 0.6.1)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message