hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2018) 1073: Move all journal stream management code into one place
Date Wed, 10 Aug 2011 10:26:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082274#comment-13082274
] 

Ivan Kelly commented on HDFS-2018:
----------------------------------

{quote}
getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream interface. The
first txnid must be the last read plus one. The last txnid can be obtained using getNumberOfTransactions.
Similarly in the constructor of EditLogFileInputStream. It might lead to inconsistent use
of EditLogFileInputStream, for example if the file contents don't match the transaction ids
being passed.
{quote}
getLastTxId is required by FSEditLog#selectInputStreams to make sure a continuous set of streams
is selected. getNumberOfTransactions wouldn't work here, because that counts the how many
transactions are available on a journal manager from that point, not how much is in the next
segment.

Take the scenario where you have to logs with txns A[[1,100][101,140][201,300]] & B[[1,100][101,200][201,240]].
A has had an error at txn 140 so that stream is incomplete. B has had an error at txn 240,
so that stream is incomplete. 

Now if you used getNumberOfTransactions(101) for B, you get 140, and A you get 40. So, the
stream from B is selected. But we can't read all 140, we must only read the next segment as
we can't start reading half way through a segment(actually you can since HDFS-2187, but that
was done after this and its still undesirable). Since we are selecting all the streams before
starting to read them, we can't wait until we've read to the end of the stream to get last
txid. So getLastTxId() is useful here.

getFirstTxId() is very useful in BackupImage to find the current inprogress stream.

> 1073: Move all journal stream management code into one place
> ------------------------------------------------------------
>
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 0.23.0
>
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager
and the code for input streams is in the inspectors. This change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of transactions instead
of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message