hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1580) Add interface for generic Write Ahead Logging mechanisms
Date Wed, 01 Jun 2011 00:36:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041914#comment-13041914

Jitendra Nath Pandey commented on HDFS-1580:

>Jitendra had mentioned to me why he preferred the getNumTransaction(sinceTx) but I >forget
the reason.
 getNumTransaction(sinceTx) will throw an exception if it sees a gap (in sequence of transactions
due to an earlier failure of the journal) after sinceTx. It will return a number only if it
the journal can actually serve those many transactions starting from sinceTx.

>Finalizing in getNumTransactions is a bit messy.
 getNumTransactions will also be called by readers of edit logs. Finalize or recover should
happen only in the context of the writer. I think finalization might make sense at the creation
of output stream. For example, finalize the edit logs when namenode comes back up, after a
crash, and opens output stream for writing. A separate recover method in the interface may
also be useful.
  Two distinct cases where getNumTransactions can be used:
  (a) At namenode startup or backup at failover: 
       In this case the in_progress file must be read to capture all the transactions. This
is in the context of the writer.
  (b) Checkpointer, backup (non-failover case) or any other reader:
       In this case in_progress file can be ignored and checkpoint only up to the last rolled/finalized
edit log file. This is the context of a reader.

 I think we have following options
  1) getNumTransactions reads in_progress file in both cases up to whatever can be read successfully.
Caveat: Should checkpointer download the in_progress file as well?
  2) Don't read in_progress file, and handle case (a) by first calling a 'recover' method
that finalizes the edit logs, and handle case (b) by rolling the edit logs.
  3) Third option, is to have two separate methods one that counts in_progress file and other

It seems to me option (1) is the simplest. Checkpointer doesn't need to download in_progress
file, however for shared nfs storage it can read in_progress file too.

> Add interface for generic Write Ahead Logging mechanisms
> --------------------------------------------------------
>                 Key: HDFS-1580
>                 URL: https://issues.apache.org/jira/browse/HDFS-1580
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: EditlogInterface.1.pdf, EditlogInterface.2.pdf, HDFS-1580+1521.diff,
HDFS-1580.diff, HDFS-1580.diff, HDFS-1580.diff, generic_wal_iface.pdf, generic_wal_iface.pdf,
generic_wal_iface.pdf, generic_wal_iface.txt

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message