Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EEDF3AC8 for ; Thu, 28 Apr 2011 23:28:43 +0000 (UTC) Received: (qmail 47554 invoked by uid 500); 28 Apr 2011 23:28:43 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 47518 invoked by uid 500); 28 Apr 2011 23:28:43 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 47510 invoked by uid 99); 28 Apr 2011 23:28:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 23:28:43 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 23:28:41 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 07B4FB89DA for ; Thu, 28 Apr 2011 23:28:04 +0000 (UTC) Date: Thu, 28 Apr 2011 23:28:04 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <401251865.10495.1304033284028.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <31416615.309351294823386532.JavaMail.jira@thor> Subject: [jira] [Commented] (HDFS-1580) Add interface for generic Write Ahead Logging mechanisms MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026772#comment-13026772 ] Todd Lipcon commented on HDFS-1580: ----------------------------------- bq. which is relevant for understanding the edit logs. I agree that version is a little overloaded but that can be addressed in a different jira Agreed that's a separate JIRA -- I just wanted to clarify that the version you're talking about here is the "edits log serialization format version" rather than something about actual layout. bq. If namenode has called purgeTransactions it should never ask for older transaction ids Fair enough. bq. Apart from that, sinceTxnId doesn't assume any boundary I think that will really complicate things like edits transfer in the 2NN. In the file-based storage there's no clean way to seek to a particular transaction ID, meaning we'd have to add in this facility into EditLogInputStream, etc. That's a lot of complexity for little benefit that I can see. bq. The motivation for "mark" method was that BK has this limitation that open ledgers cannot be read, "mark" will give a cue to a BK implementation that the current ledger should be made available for reading This seems like a somewhat serious flaw. If we anticipate using BK for HA, I was under the impression that the "hot backup" would be following along on the edits as they're written into BK. What you're saying here implies that the primary NN would have to be rolling its logs every few seconds if you want the standby to be truly "hot". bq. If an implementation doesn't have this limitation it can just ignore mark, that is why I didn't call it roll Another way of doing this is to say that, if an implementation _does_ have this limitation, it can choose to "mark" whenever it likes. No? bq. I assumed that a write also syncs, because in most operations we sync immediately after writing the log, and in this design we are writing the entire transaction as a unit. In fact this is not at all how the current design works. Most operations write the edit to the log while holding the FSN lock (to ensure serialized order between ops) and then drop the FSN lock to sync. This allows group commit and is crucial for reasonable throughput. bq. Management of buffers and flush, should be the responsibility of the implementation. But flush needs to be coordinated as a separate action from writing in order to achieve lock release and group commit. bq. readNext() reads the version and txnId and synchronizes the underlying inputstream to the begining of transaction record and then getTxn can directly return the underlying inputstream for reading the transaction bytes Yep, that makes sense. bq. LogSegments gets rid of roll method but exposes the underlying units of storage to the namenode which I don't think is required It's not absolutely required in the theoretical sense, but in the sense that we'd like to keep the code as simple as possible, I think it helps that goal. For example, edit log transfer right now is based around the concept of discrete files which can be entirely fetched, with an associated md5sum. If we have to support fetching arbitrary ranges of transactions, these safety checks become more difficult to implement. And, we need to split the "file transfer" code into two different code paths, one for files (fsimage) and another for edits (arbitrary transaction ranges) bq. Do we really want this property? Isn't it better that we don't expose any boundaries between transactions to the namenode? Yes, this property is very useful for operations. Refer to the discussion on HDFS-1073 about this property. The fact that I can run "md5sum /data/{1..4}/dfs/name/current/*" and verify that the files are all identical gives me great peace of mind. > Add interface for generic Write Ahead Logging mechanisms > -------------------------------------------------------- > > Key: HDFS-1580 > URL: https://issues.apache.org/jira/browse/HDFS-1580 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: EditlogInterface.1.pdf, HDFS-1580+1521.diff, HDFS-1580.diff, HDFS-1580.diff, HDFS-1580.diff, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.txt > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira