Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 436E3F9AD for ; Wed, 21 Aug 2013 18:01:31 +0000 (UTC) Received: (qmail 87883 invoked by uid 500); 21 Aug 2013 18:01:25 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 87742 invoked by uid 500); 21 Aug 2013 18:01:21 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 87729 invoked by uid 99); 21 Aug 2013 18:01:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 18:01:20 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zshepherd@about.com designates 74.113.233.245 as permitted sender) Received: from [74.113.233.245] (HELO mail.staff.iaccap.com) (74.113.233.245) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 18:01:14 +0000 Received: from S059EXCHHUB01.staff.iaccap.com (10.90.249.99) by s059hbca02.staff.iaccap.com (10.90.249.33) with Microsoft SMTP Server (TLS) id 8.3.192.1; Wed, 21 Aug 2013 14:00:53 -0400 Received: from [10.10.3.32] (216.223.13.166) by mail.outlook.com (10.90.249.99) with Microsoft SMTP Server id 14.2.247.3; Wed, 21 Aug 2013 14:00:52 -0400 Message-ID: <52150054.5020707@about.com> Date: Wed, 21 Aug 2013 14:00:52 -0400 From: Zac Shepherd User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120912 Thunderbird/15.0.1 MIME-Version: 1.0 To: Hadoop Users Subject: bz2 decompress in place Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [216.223.13.166] X-Virus-Checked: Checked by ClamAV on apache.org Hello, I'm using an ancient version of Hadoop (0.20.2+228) and trying to run a m/r job over a bz2 compressed file (18G). Since splitting support wasn't added until 0.21.0, a single mapper is getting allocated and will take far too long to complete. Is there a way that I can decompress the file in place, or am I going to have to copy it down, decompress it locally, and then copy it back up to the cluster? Thanks for any help, Zac Shepherd