hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-646) name node server does not load large (> 2^31 bytes) edits file
Date Sun, 29 Oct 2006 22:26:18 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-646?page=comments#action_12445466 ] 
Milind Bhandarkar commented on HADOOP-646:


Finding out number of entries from the size of the edits file is not possible, since these
are not fixed size entries. However, trying to read a next byte and ending when we get an
EOFExxception will allow us to avoid the call to available() completely. I am attaching a
patch (untested for files > 2G) which does exactly that.


Can you try this patch out for the backed-up edits file (6.5 G) that you have and let me know
if it worked ? (It should be safe to try this patch out if you backup your image.)

- milind

> name node server does not load large (> 2^31 bytes) edits file
> --------------------------------------------------------------
>                 Key: HADOOP-646
>                 URL: http://issues.apache.org/jira/browse/HADOOP-646
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.7.1
>            Reporter: Christian Kunz
>            Priority: Critical
> FileInputStream.available() returns negative values when reading a large file (> 2^31
bytes) -- this is a known (unresolved) java bug:
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6402006
> Consequence: a large edits file is not loaded and deleted without any warnings. The system
reverts back to the old fsimage.
> This happens in jdk1.6 as well, i.e. the bug has not yet been fixed.
> In addition, when finally I was able to load my big cron-backed-up edits file (6.5 GB)
 with a kludgy work-around, the blocks did not exist anymore in the data node servers, probably
deleted from the previous attempts when the name node server did not know about the changed
> Moral till this is fixed or worked-around: don't wait too long to restart the name node
server. Otherwise this is a way to lose the entire dfs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message