Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 45793 invoked from network); 24 May 2006 23:08:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 24 May 2006 23:08:12 -0000 Received: (qmail 84421 invoked by uid 500); 24 May 2006 23:08:12 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 84293 invoked by uid 500); 24 May 2006 23:08:12 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 84284 invoked by uid 99); 24 May 2006 23:08:12 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 May 2006 16:08:12 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 May 2006 16:08:11 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9DA307141F9 for ; Wed, 24 May 2006 23:07:30 +0000 (GMT) Message-ID: <9688366.1148512050641.JavaMail.jira@brutus> Date: Wed, 24 May 2006 23:07:30 +0000 (GMT+00:00) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-224) Allow simplified versioning for namenode and datanode metadata. In-Reply-To: <3300836.1147816565914.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-224?page=comments#action_12413194 ] Doug Cutting commented on HADOOP-224: ------------------------------------- +0 If folks would like, I will commit this, but it seems a bit like a solution in search of a problem. It introduces, for example, some potentially big, temporary datastructures which could bite us later, without really providing any new useful functionality. Yes, we will probably alter the file format, and this will probably be able to handle those changes, but it might not. Personally I'd wait until I was changing the file format to add this. But that's a matter of style. I will commit this if others strongly feel it is ready and now is the time. > Allow simplified versioning for namenode and datanode metadata. > --------------------------------------------------------------- > > Key: HADOOP-224 > URL: http://issues.apache.org/jira/browse/HADOOP-224 > Project: Hadoop > Type: Improvement > Components: dfs > Environment: All > Reporter: Milind Bhandarkar > Attachments: hadoop-224.patch > > Currently namenode has two types of metadata: The FSImage, and FSEdits. FSImage contains information abut Inodes, and FSEdits contains a list of operations that were not saved to FSImage. Datanode currently does not have any metadata, but would have it some day. > The file formats used for storing these metadata will evolve over time. It is important for the file-system to be backward compatible. That is, the metadata readers need to be able to identify which version of the file-format we are using, and need to be able to read information therein. As we add information to these metadata, the complexity of the reader increases dramatically. > I propose a versioning scheme with a major and minor version number, where a different reader class is associated with a major number, and that class interprets the minor number internally. The readers essentially form a chain starting with the latest version. Each version-reader looks at the file and if it does not recognize the version number, passes it to the version reader next to it by calling the parse method, returnng the results of the parse method up the chain (In case of the namenode, the parse result is an array of Inodes. > This scheme has an advantage that every time a new major version is added, the new reader only needs to know about the reader for its immediately previous version, and every reader needs to know only about which major version numbers it can read. > The writer is not so versioned, because metadata is always written in the most current version format. > One more change that is needed for simplified versioning is that the "struct-surping" of dfs.Block needs to be removed. Block's contents will change in later versions, and older versions should still be able to readFields properly. This is more general than Block of course, and in general only basic datatypes should be used as Writables in DFS metadata. > For edits, the reader should return pairs' array. This will also remove the limitation of two operands for very opcodes, and will be more extensible. > Even with this new versioning scheme, the last Reader in the reader-chain would recognize current format, thus maintaining full backward compatibility. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira