hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place
Date Tue, 23 Aug 2011 23:45:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089875#comment-13089875
] 

Todd Lipcon commented on HDFS-2018:
-----------------------------------

Just wanted to write some thoughts on this before the call tomorrow:

- It's unclear how the "segmentless" design started in this JIRA and continued in HDFS-1580
deals with edits transfer. Several places above Jitendra and Ivan have referred to some plan
to remove edits transfer and instead use rsync or scp - that's certainly an option, but not
one that's been decided in public. I'd like to understand better how we plan to implement
edits transfer with this and HDFS-1580. I brought this up several times in HDFS-1580's comments,
but most of the answers I've seen have been "BookKeeper doesn't need edits transfer" - that's
fine, but we need the existing file based setup to continue to work.

If the plan is to go entirely segmentless, then I think we should go whole-hog with it and
completely abandon the prior invariant that the storage directories on different nodes will
always roll together. I found this invariant really nice operationally, but if rolling is
meant to be pushed down to an implementation detail of the specific journal, then I don't
see how we can enforce the invariant.

By going whole-hog, I mean that the "edits fetching" in the 2NN/CN/BN would be rewritten to
simply ask for a stream of edits, and it would have no idea the boundaries of the different
files.

In the current patches proposed here, HDFS-1580, and HDFS-2158, the edits fetching (getEditsLogManifest)
is left as a kind of strange second code path that jumps around the otherwise nice abstractions.


It seems to me that one of the primary objections raised here is that there are already a
bunch of other patches queued up on top of this one. Do you have a github branch available
that would help others see where it's going? eg the latest patch on HDFS-1580 is 10 weeks
old.

> 1073: Move all journal stream management code into one place
> ------------------------------------------------------------
>
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 0.23.0
>
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
hdfs-2018-otherapi.txt, hdfs-2018.txt
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager
and the code for input streams is in the inspectors. This change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of transactions instead
of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message