hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
Date Fri, 03 Jun 2011 23:41:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044133#comment-13044133
] 

Todd Lipcon commented on HDFS-2003:
-----------------------------------

BTW, I thought of another reason why EOFException shouldn't be treated the same if it comes
in the middle of a transaction:

A lot of the transaction serialization formats have length-prefixed strings. In the case that
there is corruption in the file, I often find we get into a situation where it's trying to
read a length-prefixed string but instead gets some other random bytes (eg part of a filename).
This causes it to issue a read() for a very large number of bytes, which, depending on how
much heap is available, usually results in an OOME or an early EOFException. In the case of
the EOFException, we don't want to treat it as a successful log read, which is what the code
does now.

> Separate FSEditLog reading logic from editLog memory state building logic
> -------------------------------------------------------------------------
>
>                 Key: HDFS-2003
>                 URL: https://issues.apache.org/jira/browse/HDFS-2003
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff
>
>
> Currently FSEditLogLoader has code for reading from an InputStream interleaved with code
which updates the FSNameSystem and FSDirectory. This makes it difficult to read an edit log
without having a whole load of other object initialised, which is problematic if you want
to do things like count how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory state.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message