Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 88264 invoked from network); 28 Aug 2006 11:15:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Aug 2006 11:15:40 -0000 Received: (qmail 76062 invoked by uid 500); 28 Aug 2006 11:15:39 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 76043 invoked by uid 500); 28 Aug 2006 11:15:39 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 76019 invoked by uid 99); 28 Aug 2006 11:15:39 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Aug 2006 04:15:39 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Aug 2006 04:15:38 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7B1D47142D4 for ; Mon, 28 Aug 2006 11:12:23 +0000 (GMT) Message-ID: <15522744.1156763543500.JavaMail.jira@brutus> Date: Mon, 28 Aug 2006 04:12:23 -0700 (PDT) From: "Arun C Murthy (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors' In-Reply-To: <29672656.1155187633922.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12430966 ] Arun C Murthy commented on HADOOP-441: -------------------------------------- Doug, If I'm not missing something here... we can reuse the compressor by having one 'Deflater/Inflater' object per Writer/Reader instead of the pool right? Even then creating the stream with the above compressor will probably be quite significant for RecordCompressWriter where we would need to do this per record? Appreciate any further details on this one. Also, I'm concerned this solution will be quite gzip specific, which be unsuitable for a generic 'custom' compressor... thoughts? > SequenceFile should support 'custom compressors' > ------------------------------------------------ > > Key: HADOOP-441 > URL: http://issues.apache.org/jira/browse/HADOOP-441 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.6.0 > > > SequenceFiles should support 'custom compressors' which can be specified by the user on creation of the file. > Readily available packages for gzip and zip (java.util.zip) are among obvious choices to support. Of course there will be hooks so that other compressors can be added in future as long as there is a way to construct (input/output) streams on top of the compressor/decompressor. > The 'classname' of the 'custom compressor/decompressor' could be stored in the header of the SequenceFile which can then be used by SequenceFile.Reader to figure out the appropriate 'decompressor'. Thus I propose we add constructors to SequenceFile.Writer which take in the 'classname' of the compressor's input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream or GZIPOutputStream/GZIPInputStream), which acts as the hook for future compressors/decompressors. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira