Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 18906 invoked from network); 4 Jun 2008 00:33:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Jun 2008 00:33:14 -0000 Received: (qmail 28822 invoked by uid 500); 4 Jun 2008 00:33:11 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 28793 invoked by uid 500); 4 Jun 2008 00:33:11 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 28782 invoked by uid 99); 4 Jun 2008 00:33:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jun 2008 17:33:11 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Jun 2008 00:32:23 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2A071234C13D for ; Tue, 3 Jun 2008 17:32:45 -0700 (PDT) Message-ID: <2118660667.1212539565171.JavaMail.jira@brutus> Date: Tue, 3 Jun 2008 17:32:45 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3315) New binary file format In-Reply-To: <774412089.1209158875801.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602132#action_12602132 ] Owen O'Malley commented on HADOOP-3315: --------------------------------------- Srikanth, I don't understand your concern. When the user calls append(long, long), the writer can decide whether to start a new block or not based on the lengths. So as the client calls write(byte[], int, int) on the output stream, it can be written directly to the file stream or codec's ByteBuffer. For codecs like lzo, the write may be broken into multiple calls to handle the required chunking. And yes, to make this efficient, you need to be able to get the serialized length of the objects. > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Srikanth Kakani > Attachments: Tfile-1.pdf, TFile-2.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs to compress or decompress. It would be good to have a file format that only needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.