Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 9546 invoked from network); 8 Sep 2006 21:21:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 8 Sep 2006 21:21:01 -0000 Received: (qmail 14228 invoked by uid 500); 8 Sep 2006 21:21:01 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 14040 invoked by uid 500); 8 Sep 2006 21:21:00 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 14028 invoked by uid 99); 8 Sep 2006 21:21:00 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Sep 2006 14:21:00 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Sep 2006 14:20:59 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 104FB714302 for ; Fri, 8 Sep 2006 21:17:25 +0000 (GMT) Message-ID: <5360358.1157750245064.JavaMail.jira@brutus> Date: Fri, 8 Sep 2006 14:17:25 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-474) support compressed text files as input and output In-Reply-To: <19704285.1156402132980.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-474?page=all ] Doug Cutting updated HADOOP-474: -------------------------------- Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this. Thanks, Owen! > support compressed text files as input and output > ------------------------------------------------- > > Key: HADOOP-474 > URL: http://issues.apache.org/jira/browse/HADOOP-474 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.5.0 > Reporter: Owen O'Malley > Assigned To: Owen O'Malley > Fix For: 0.6.0 > > Attachments: text-gz-2.patch, text-gz-3.patch, text-gz.patch > > > I'd like TextInputFomat and TextOutputFormat to automatically compress and uncompress text files when they are read and written. Furthermore, I'd like to be able to use custom compressors as defined in HADOOP-441. Therefore, I propose: > Adding a map of compression codecs in the server config files: > io.compression.codecs = "=,..." > so the default would be something like: > > io.compression.codecs > .gz=org.apache.hadoop.io.GZipCodec,.Z=org.apache.hadoop.io.ZipCodec > A list of file suffixes and the codecs for them. > > note that the suffix can include multiple "." so you could support suffixes like ".tar.gz", but they are just treated as literals against the end of the filename. > If the TextInputFormat is dealing with such a file, it: > 1. makes a single split > 2. decompresses automatically > On the output side, if mapred.output.compress is true, then TextOutputFormat would use a new property mapred.output.compression.codec that would define the codec to use to compress the outputs, defaulting to gzip. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira