Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 28 Apr 2011 22:32:03 +0000 (UTC)
From: "Jitendra Nath Pandey (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: 
 <1085137408.10167.1304029923876.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: <31416615.309351294823386532.JavaMail.jira@thor>
Subject: [jira] [Commented] (HDFS-1580) Add interface for generic Write
 Ahead Logging mechanisms
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026742#comment-13026742 ] 

Jitendra Nath Pandey commented on HDFS-1580:
--------------------------------------------

- The design doesn't go in any detail regarding snapshots concurring to your view. However, I mentioned about it because it is one of the requirements we will have to address eventually.
- This jira doesn't change any semantics related to the layout version. The version is a piece of metadata that needs to be stored with edit logs so that namenode can understand and load edit logs. I am open to making it a byte array instead of just an integer so that namenode can store any metadata it wants to store, which is relevant for understanding the edit logs. I agree that version is a little overloaded but that can be addressed in a different jira.
- I think retention policy for edit logs should be namenode's responsibility, because retention of edit logs will be closely tied with retention of old checkpoint images. If namenode has called purgeTransactions it should never ask for older transaction ids.
- "mark" means that the last written transaction is available for reading including all previous transactions. sinceTxnId in getInputStream can be any transaction Id before the last call of mark or close of the output stream. Apart from that, sinceTxnId doesn't assume any boundary.
- The motivation for "mark" method was that BK has this limitation that open ledgers cannot be read, "mark" will give a cue to a BK implementation that the current ledger should be made available for reading. If an implementation doesn't have this limitation it can just ignore mark, that is why I didn't call it roll. That also explains that it is different from sync.
- I assumed that a write also syncs, because in most operations we sync immediately after writing the log, and in this design we are writing the entire transaction as a unit. Management of buffers and flush, should be the responsibility of the implementation.
- In EditLogInputStream, I think we can rename next to readNext, it will look less like iterator. One way to avoid extra array copy would be that readNext() reads the version and txnId and synchronizes the underlying inputstream to the begining of transaction record and then getTxn can directly return the underlying inputstream for reading the transaction bytes. Does that make sense?

LogSegements:
  LogSegments gets rid of roll method but exposes the underlying units of storage to the namenode which I don't think is required.

>.. elsewhere we have discussed that we want to keep the property that logs always roll together across all parts of the system.
  Do we really want this property? Isn't it better that we don't expose any boundaries between transactions to the namenode?
> We generally want the property that, while saving a namespace or in safe mode, we don't accept edits.
  This can be achieved by just closing the EditLogOutputStream.
  

> Add interface for generic Write Ahead Logging mechanisms
> --------------------------------------------------------
>
>                 Key: HDFS-1580
>                 URL: https://issues.apache.org/jira/browse/HDFS-1580
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: EditlogInterface.1.pdf, HDFS-1580+1521.diff, HDFS-1580.diff, HDFS-1580.diff, HDFS-1580.diff, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira