Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 93189 invoked from network); 29 Jan 2009 20:28:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Jan 2009 20:28:29 -0000 Received: (qmail 29970 invoked by uid 500); 29 Jan 2009 20:28:23 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 29947 invoked by uid 500); 29 Jan 2009 20:28:23 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 29936 invoked by uid 99); 29 Jan 2009 20:28:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2009 12:28:23 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jan 2009 20:28:21 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3640E234C4CE for ; Thu, 29 Jan 2009 12:28:00 -0800 (PST) Message-ID: <1770703034.1233260880221.JavaMail.jira@brutus> Date: Thu, 29 Jan 2009 12:28:00 -0800 (PST) From: "stack (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-3315) New binary file format In-Reply-To: <774412089.1209158875801.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668588#action_12668588 ] stack commented on HADOOP-3315: ------------------------------- Hmm. Looking at doing random accesses and it seems like a bunch of time is spent in inBlockAdvance advancing sequentially through blocks rather than do something like a binary search to find desired block location. Also, as we advance, we create and destroy a bunch of objects such as the stream to hold the value. Can you comment on why this is (compression should be on tfile block boundaries, right so nothing to stop hopping into the midst of a tfile)? Thanks. > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Amir Youssefi > Fix For: 0.21.0 > > Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs to compress or decompress. It would be good to have a file format that only needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.