Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 21144 invoked from network); 17 Aug 2006 17:43:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 17 Aug 2006 17:43:13 -0000 Received: (qmail 75563 invoked by uid 500); 17 Aug 2006 17:43:13 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 75542 invoked by uid 500); 17 Aug 2006 17:43:13 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 75533 invoked by uid 99); 17 Aug 2006 17:43:12 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Aug 2006 10:43:12 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Aug 2006 10:43:12 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1FE1A7142C7 for ; Thu, 17 Aug 2006 17:40:15 +0000 (GMT) Message-ID: <25469875.1155836415128.JavaMail.jira@brutus> Date: Thu, 17 Aug 2006 10:40:15 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors' In-Reply-To: <29672656.1155187633922.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12428716 ] Doug Cutting commented on HADOOP-441: ------------------------------------- The constructors should probably take class instances rather than class names. Codecs should be based on DeflaterOutputStream and InflaterInputStream, but it would be best to write just one name to the file. So we might add a compressor factory interface like: public interface CompressionCodec extends Configurable { DeflaterOutputStream createDeflaterOutputStream(OutputStream out); InflaterInputStream createInflaterInputStream(InputStream in); } Then the constructors would take an instance of this interface and write the name of that class into the file. Implementations would be required to provide a public default constructor. We might also add methods like the following to this interface: void writeVersion(DataOutputStream out); void readVersion(DataInputStream in) throws VersionMismatchException; That would permit folks to safely revise a codec without having to use a new class name. > SequenceFile should support 'custom compressors' > ------------------------------------------------ > > Key: HADOOP-441 > URL: http://issues.apache.org/jira/browse/HADOOP-441 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.6.0 > > > SequenceFiles should support 'custom compressors' which can be specified by the user on creation of the file. > Readily available packages for gzip and zip (java.util.zip) are among obvious choices to support. Of course there will be hooks so that other compressors can be added in future as long as there is a way to construct (input/output) streams on top of the compressor/decompressor. > The 'classname' of the 'custom compressor/decompressor' could be stored in the header of the SequenceFile which can then be used by SequenceFile.Reader to figure out the appropriate 'decompressor'. Thus I propose we add constructors to SequenceFile.Writer which take in the 'classname' of the compressor's input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream or GZIPOutputStream/GZIPInputStream), which acts as the hook for future compressors/decompressors. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira