hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3315) New binary file format
Date Tue, 03 Jun 2008 23:22:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602108#action_12602108

Owen O'Malley commented on HADOOP-3315:

I would suggest splitting the API into two parts a cooked part and raw part.

public class TFile {
  static public class Writer {
     public void append(byte[] keyData, int keyOffset, int keyLength,
                          byte[] valueData, int valueOffset, int valueLength) throws IOException;
     * Get an output stream to write the key and value to. Only valid until the next call
to append.
     * The number of bytes written to the stream must equal keyLength + valueLength.
    public OutputStream append(long keyLength, long valueLength) throws IOException;
  public static Writer create(Path p, RawComparator comparator) throws IOException;
  static public class Reader {
     * Get an input stream to read the key from. available() will return the number of bytes
     * Only valid until the next next call on this stream.
    public InputStream nextKeyInputStream() throws IOException;
     * Get and input stream to read the value from. available will return the number of bytes
     * Only valid until the next next call.
    public InputStream nextValueInputStream() throws IOException;
  public static Reader open(Path p, RawComparator comparator) throws IOException;

> New binary file format
> ----------------------
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
>         Attachments: Tfile-1.pdf, TFile-2.pdf
> SequenceFile's block compression format is too complex and requires 4 codecs to compress
or decompress. It would be good to have a file format that only needs 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message