hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-732) SequenceFile's header should allow to store metadata in the form of key/value pairs
Date Wed, 24 Jan 2007 08:03:51 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Runping Qi updated HADOOP-732:

    Attachment: seqFileMetadata.patch

Attached is a patch for this issue.

SequenceFile has a new header --- a TreeMap<Text, Text> object wrapped in a class, Metadata,
implementing Writable interface. To accomodate this, the version number is bumped up to 6.

The Reader class has a new member variable for the metadata. A method is also added for returning
the metadata object. The new code can read the files of old versions.

New constructors of various Writer classes are added to take a metadata object as their last
parameter. New createWriter static functions with metadata as the last 
parameter are also introduced. They are all backward compatible. A new unit test is added
to TestSequenceFile for testing writing/reading sequence files with metadata.
All unit tests passed.

> SequenceFile's header should allow to store metadata in the form of key/value pairs
> -----------------------------------------------------------------------------------
>                 Key: HADOOP-732
>                 URL: https://issues.apache.org/jira/browse/HADOOP-732
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
> The sequence file currently stores a fixed list of metadata attributes, such as key/value
class names, 
> compression method, etc.  To make sequence file more self descriptable, it should allow
to store a list of key/value pairs.  One particular attribute of interest is to indicate whether
the key/value classes are actually hadoop record classes, 
> if so, store the DDls for the records. This way, we may create tools to extract DDl from
a sequence file and 
> then generate necessary classes. It also make it possible to provide an interpretive
version of Hadoop record. 
> This way, even in the situation where Hadoop or the application does not have the necessary
> a sequence file of Hadoop records can be read and deserialized "interpretively".

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message