Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 53373 invoked from network); 3 Jun 2008 23:41:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Jun 2008 23:41:09 -0000 Received: (qmail 83598 invoked by uid 500); 3 Jun 2008 23:41:11 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 83566 invoked by uid 500); 3 Jun 2008 23:41:11 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 83554 invoked by uid 99); 3 Jun 2008 23:41:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 16:41:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 23:40:23 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 307D9234C13A for ; Tue, 3 Jun 2008 16:40:45 -0700 (PDT) Message-ID: <1859715739.1212536445197.JavaMail.jira@brutus> Date: Tue, 3 Jun 2008 16:40:45 -0700 (PDT) From: "Srikanth Kakani (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3315) New binary file format In-Reply-To: <774412089.1209158875801.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602114#action_12602114 ] Srikanth Kakani commented on HADOOP-3315: ----------------------------------------- Owen, There would be one complication in exposing the append(long keyLength, long valueLength) that we did not discuss earlier. Although it can be handled. If it the key,value is at the beginning of a block we need to copy to a byte array in the key.serialize(outputstream). We can do this by having a keyValueOutputStream(keybytes,valuebytes, outputstream), that captures the first keybytes of data written into a buffer. This needs to be done to generate an index. But it starts getting ugly. I would also suggest ObjectFile should be extending the TFile and it can do all this in a neater fashion without exposing the append(keyLength, valueLength). Additionally to make any of this feasible (You mentioned this earlier, I just want to record it), serializers should also have getSerializedLength(). > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Srikanth Kakani > Attachments: Tfile-1.pdf, TFile-2.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs to compress or decompress. It would be good to have a file format that only needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.