hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-224) Allow simplified versioning for namenode and datanode metadata.
Date Wed, 17 May 2006 05:18:06 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-224?page=comments#action_12412094 ] 

Runping Qi commented on HADOOP-224:
-----------------------------------


Is this issue the same as the general versioning problem of object deserialization? Or did
I miss something?

In my own programs, when I need to write a serializable class, I've been using  a convention
like the following:

	public void write(DataOutput out) throws IOException {
		out.writeInt(Link.version);
		out.writeUTF(this.url);
	}

	private void readFields_1(DataInput in) throws IOException {
		this.url = in.readUTF();
		...
	}

	public void readFields(DataInput in) throws IOException {
		int version = in.readInt();
		switch (version) {
		case 1:
			this.readFields_1(in);
			break;
		default:
			throw new IOException("Serialization version number " + version + " of class Link is not
recognized\n");
		}
	}

When I make changes on the class representation that affect how the class is serialized, I'd
implement a new read methods:

	public void write(DataOutput out) throws IOException {
		out.writeInt(Link.version);
		out.writeUTF(this.url);
		out.writeUTF(this.anchor);
	}

	private void readFields_2(DataInput in) throws IOException {
		this.url = in.readUTF();
		this.anchor = in.readUTF();
                                                     ....
	}

	public void readFields(DataInput in) throws IOException {
		int version = in.readInt();
		switch (version) {
		case 1:
			this.readFields_1(in);
			break;
                                                     case 2:
                                                                                this.readFields_2(in);
                                                                                break;
		default:
			throw new IOException("Serialization version number " + version + " of class Link is not
recognized\n");
		}
	}


I found this approach provides me great flexibility in versioning while maintaining backward
compatibility.
And the code is also not hard to maintain.




> Allow simplified versioning for namenode and datanode metadata.
> ---------------------------------------------------------------
>
>          Key: HADOOP-224
>          URL: http://issues.apache.org/jira/browse/HADOOP-224
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>  Environment: All
>     Reporter: Milind Bhandarkar

>
> Currently namenode has two types of metadata: The FSImage, and FSEdits. FSImage contains
information abut Inodes, and FSEdits contains a list of operations that were not saved to
FSImage. Datanode currently does not have any metadata, but would have it some day. 
> The file formats used for storing these metadata will evolve over time. It is important
for the file-system to be backward compatible. That is, the metadata readers need to be able
to identify which version of the file-format we are using, and need to be able to read information
therein. As we add information to these metadata, the complexity of the reader increases dramatically.
> I propose a versioning scheme with a major and minor version number, where a different
reader class is associated with a major number, and that class interprets the minor number
internally. The readers essentially form a chain starting with the latest version. Each version-reader
looks at the file and if it does not recognize the version number, passes it to the version
reader next to it by calling the parse method, returnng the results of the parse method up
the chain (In case of the namenode, the parse result is an array of Inodes.
> This scheme has an advantage that every time a new major version is added, the new reader
only needs to know about the reader for its immediately previous version, and every reader
needs to know only about which major version numbers it can read.
> The writer is not so versioned, because metadata is always written in the most current
version format.
> One more change that is needed for simplified versioning is that the "struct-surping"
of dfs.Block needs to be removed. Block's contents will change in later versions, and older
versions should still be able to readFields properly. This is more general than Block of course,
and in general only basic datatypes should be used as Writables in DFS metadata.
> For edits, the reader should return <opcode, ArrayWritable> pairs' array. This
will also remove the limitation of two operands for very opcodes, and will be more extensible.
> Even with this new versioning scheme, the last Reader in the reader-chain would recognize
current format, thus maintaining full backward compatibility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message