hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1580) Add interface for generic Write Ahead Logging mechanisms
Date Thu, 28 Apr 2011 22:32:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026742#comment-13026742

Jitendra Nath Pandey commented on HDFS-1580:

- The design doesn't go in any detail regarding snapshots concurring to your view. However,
I mentioned about it because it is one of the requirements we will have to address eventually.
- This jira doesn't change any semantics related to the layout version. The version is a piece
of metadata that needs to be stored with edit logs so that namenode can understand and load
edit logs. I am open to making it a byte array instead of just an integer so that namenode
can store any metadata it wants to store, which is relevant for understanding the edit logs.
I agree that version is a little overloaded but that can be addressed in a different jira.
- I think retention policy for edit logs should be namenode's responsibility, because retention
of edit logs will be closely tied with retention of old checkpoint images. If namenode has
called purgeTransactions it should never ask for older transaction ids.
- "mark" means that the last written transaction is available for reading including all previous
transactions. sinceTxnId in getInputStream can be any transaction Id before the last call
of mark or close of the output stream. Apart from that, sinceTxnId doesn't assume any boundary.
- The motivation for "mark" method was that BK has this limitation that open ledgers cannot
be read, "mark" will give a cue to a BK implementation that the current ledger should be made
available for reading. If an implementation doesn't have this limitation it can just ignore
mark, that is why I didn't call it roll. That also explains that it is different from sync.
- I assumed that a write also syncs, because in most operations we sync immediately after
writing the log, and in this design we are writing the entire transaction as a unit. Management
of buffers and flush, should be the responsibility of the implementation.
- In EditLogInputStream, I think we can rename next to readNext, it will look less like iterator.
One way to avoid extra array copy would be that readNext() reads the version and txnId and
synchronizes the underlying inputstream to the begining of transaction record and then getTxn
can directly return the underlying inputstream for reading the transaction bytes. Does that
make sense?

  LogSegments gets rid of roll method but exposes the underlying units of storage to the namenode
which I don't think is required.

>.. elsewhere we have discussed that we want to keep the property that logs always roll
together across all parts of the system.
  Do we really want this property? Isn't it better that we don't expose any boundaries between
transactions to the namenode?
> We generally want the property that, while saving a namespace or in safe mode, we don't
accept edits.
  This can be achieved by just closing the EditLogOutputStream.

> Add interface for generic Write Ahead Logging mechanisms
> --------------------------------------------------------
>                 Key: HDFS-1580
>                 URL: https://issues.apache.org/jira/browse/HDFS-1580
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: EditlogInterface.1.pdf, HDFS-1580+1521.diff, HDFS-1580.diff, HDFS-1580.diff,
HDFS-1580.diff, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.txt

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message