Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDE9B104F1 for ; Tue, 17 Sep 2013 20:37:45 +0000 (UTC) Received: (qmail 60373 invoked by uid 500); 17 Sep 2013 20:37:37 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 60196 invoked by uid 500); 17 Sep 2013 20:37:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 60170 invoked by uid 99); 17 Sep 2013 20:37:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 20:37:33 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of amits@infolinks.com designates 207.126.144.145 as permitted sender) Received: from [207.126.144.145] (HELO eu1sys200aog118.obsmtp.com) (207.126.144.145) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 17 Sep 2013 20:37:27 +0000 Received: from mail-ie0-f175.google.com ([209.85.223.175]) (using TLSv1) by eu1sys200aob118.postini.com ([207.126.147.11]) with SMTP ID DSNKUji9cZcnIEwB4e+sY8YGvu1Qj8ixgGce@postini.com; Tue, 17 Sep 2013 20:37:06 UTC Received: by mail-ie0-f175.google.com with SMTP id e14so11267483iej.34 for ; Tue, 17 Sep 2013 13:37:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=xNu68iEUa6qCq2nCJzdReDi7UEDWtxJ431ca0QQ3Jlg=; b=MCend0ntSQ8Z+Bg0dvcvzLwdxir5B7Oe8HqTI88ybLkTSUv308J7lol5efbWCeY4D7 lT5n+7Kp7KDyCfZ/eR9o5ZMtQKBGer2XEDYUNIsYl4Hx3H9TwSxYL6ZcygRvXjYYK33Z SMo2Dm4ull6xjaK7sk2Tf4aaqorU0nhMDTFe27C+mxgU7UZ/l+ZHrG0+opEVj3mDa+LF fr83gs9YR5Wj078AmBKSsRvRqIS94Fw8zXhUwMZruHAQ7d3amPUjZOA9sS1vbzZMr8hj 0q+JGLqGZu/AejpbIyRChpAxXBGcVquk9dxK0s0XiSglctawg+MnLWYmznUI6vg+wliE T2fw== X-Gm-Message-State: ALoCoQmfrPmMQfKBTGUO4coHPsywbQSmOoMCApMsZmC3JQOYr+vAElvmwemAyHrjbi6WJa3JFYSq+CFslua0/K1vR+oK36wHV3rJkgh+fgABICg6zp95edEyRu7kQ5yMXE0xxb5o5DHNnN8UB9zth2Chm461qUBhU1whdv/vpOJnNuv2jOrN8sk= X-Received: by 10.50.77.83 with SMTP id q19mr1817711igw.21.1379450224309; Tue, 17 Sep 2013 13:37:04 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.50.77.83 with SMTP id q19mr1817708igw.21.1379450224228; Tue, 17 Sep 2013 13:37:04 -0700 (PDT) Received: by 10.64.227.15 with HTTP; Tue, 17 Sep 2013 13:37:04 -0700 (PDT) Received: by 10.64.227.15 with HTTP; Tue, 17 Sep 2013 13:37:04 -0700 (PDT) Date: Tue, 17 Sep 2013 22:37:04 +0200 Message-ID: Subject: Bzip2 vs Gzip From: Amit Sela To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7bdca50499d70d04e69a48df X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdca50499d70d04e69a48df Content-Type: text/plain; charset=ISO-8859-1 Hi all, I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop (logs are gzipped into block size files). I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean that any input file bigger then block size will be split between maps ? What are the tradeoffs between the two ? Thanks. --047d7bdca50499d70d04e69a48df Content-Type: text/html; charset=ISO-8859-1

Hi all,
I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop (logs are gzipped into block size files).
I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean that any input file bigger then block size will be split between maps ?
What are the tradeoffs between the two ?

Thanks.

--047d7bdca50499d70d04e69a48df--