hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic
Date Fri, 03 Jun 2011 18:11:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043928#comment-13043928

Todd Lipcon commented on HDFS-2003:

bq. This is intentional. We read from the journal which has the most edits available to read.
If this happens to be a journal with a truncated file, that journal is still the journal with
the most up to date logs. Do you disagree?

In my opinion, that's the responsibility of the "edit log recovery" process to determine,
and then truncate the file at the correct length. But, I see your point as well, and don't
feel strongly about it. Either way, though, it's a distinct change from just the refactor
- can we keep the current behavior in the refactor, and then make that behavioral change separately?

Some other thoughts:
- Do we need Reader to be an inner class of FSEditLogOp? I find it a little strange to have
all of the Reader code, and then the "final Codes opCode" and "final long txid;" right after

I think the patch would produce fewer conflicts on merge if we made the following change:
- Keep FSEditLogOpCodes as is (so we don't have changes throughout EditLogFileInputStream/OutputStream/FSEditLog/Loader/OEV/etc.
(this will help prevent merge conflicts against HDFS-1936 in particular)


One idea, which you can take or leave: what if we did added a {{Class<? extends FSEditLogOp>}}
field to the Codes enum, and then did the following:
in Reader constructor:
EnumMap<Codes, FSEditLogOp> opInstances;
for (Codes c : Codes.values()) {
  opInstances.put(c, c.getOpClass().newInstance());
in readOp instead of the switch statement:
FSEditLogOp op = opInstances.get(opCode);
op.readFields(in, logVersion);
This idea would remove the object overhead of creating new objects for each case, make opcodes
more like writables, and get rid of the big switch statement. Might also be a good first step
towards sharing more code between the OEV and the normal edits loader. This is just a thought,
though - if you don't like it, ignore me :)

> Separate FSEditLog reading logic from editLog memory state building logic
> -------------------------------------------------------------------------
>                 Key: HDFS-2003
>                 URL: https://issues.apache.org/jira/browse/HDFS-2003
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff
> Currently FSEditLogLoader has code for reading from an InputStream interleaved with code
which updates the FSNameSystem and FSDirectory. This makes it difficult to read an edit log
without having a whole load of other object initialised, which is problematic if you want
to do things like count how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory state.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message