Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C3EEA9D50 for ; Mon, 30 Jan 2012 18:32:41 +0000 (UTC) Received: (qmail 65082 invoked by uid 500); 30 Jan 2012 18:32:39 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 64953 invoked by uid 500); 30 Jan 2012 18:32:39 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 64897 invoked by uid 99); 30 Jan 2012 18:32:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jan 2012 18:32:38 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jan 2012 18:32:32 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 2692F16DC34 for ; Mon, 30 Jan 2012 18:32:11 +0000 (UTC) Date: Mon, 30 Jan 2012 18:32:11 +0000 (UTC) From: "Tim Broberg (Commented) (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <889100393.8347.1327948331159.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1356399178.5333.1327821370205.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-8003) Make SplitCompressionInputStream an interface instead of an abstract class MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-8003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196283#comment-13196283 ] Tim Broberg commented on HADOOP-8003: ------------------------------------- Agreed, as compatible as possible while minimizing any increased complexity in the interface is best. My simplest and least invasive idea is this: 1 - Have SplittableCompressionCodec's createInputStream() return a CompressionInputStream instead of a SplitCompressionInputStream. 2 - Redefine SplitCompressionInputStream to be an interface instead of an abstract class. 3 - Require that all CompressionInputStreams returned by this createInputStream() method implement SplitCompressionInputStream. 4 - Modify bzip to conform to the above. 5 - (optional) applications may check that #3 is obeyed. Benefits: 1 - The application doesn't have to change at all. If a codec is an instance of SplittableCompressionCodec, call the appropriate createInputStream function and use the resulting stream as before. 2 - No duplicate classes or interfaces are introduced to confuse hapless developers. 3 - New splittable codecs can extend any CompressionInputStream they like. Can anybody describe an approach (or improvement to this one) that is less disruptive and/or simpler? > Make SplitCompressionInputStream an interface instead of an abstract class > -------------------------------------------------------------------------- > > Key: HADOOP-8003 > URL: https://issues.apache.org/jira/browse/HADOOP-8003 > Project: Hadoop Common > Issue Type: New Feature > Components: io > Affects Versions: 0.21.0, 0.22.0, 0.23.0, 1.0.0 > Reporter: Tim Broberg > > To be splittable, a codec must extend SplittableCompressionCodec which has a function returning a SplitCompressionInputStream. > SplitCompressionInputStream is an abstract class which extends CompressionInputStream, the lowest level compression stream class. > So, no codec that wants to be splittable can reuse any code from DecompressorStream or BlockDecompressorStream. > You either have to duplicate that code, or not be splittable. > SplitCompressionInputStream adds just a few very thin functions. Can we make this an interface rather than an abstract class to allow splittable decompression streams to extend DecompressorStream, BlockDecompressorStream, or whatever else we should scheme up in the future? > To my knowledge, this would impact only the BZip2 codec. None of the other implement this form of splittability yet. > LineRecordReader looks only at whether the codec is an instance of SplittableCompressionCodec, and then calls the appropriate version of createInputStream. This would not change, so the application code should not have to change, just BZip and SplitCompressionInputStream. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira