Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD5FCDC9B for ; Wed, 7 Nov 2012 12:41:59 +0000 (UTC) Received: (qmail 73539 invoked by uid 500); 7 Nov 2012 12:41:55 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 73019 invoked by uid 500); 7 Nov 2012 12:41:54 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 73003 invoked by uid 99); 7 Nov 2012 12:41:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 12:41:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 12:41:46 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so2539831iea.35 for ; Wed, 07 Nov 2012 04:41:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=UaBr8xUjNGVfhW2FZ1cz8ZFTmSD+/7Ek3LXT16amy20=; b=AT15cZ1OGTq4m7WPxm6QJBo5P/nGHzI6u7meROMu78bUFqIPT2kPvw5kauZmt/p0jR kGI6C56xe2zHqn5Rax9E/ucc/rDGF8d9TndeEltvWuxSMFw73BE94bmD8eRl2550ZBr5 TBzv7mI0VU9aOJSfE67OEHIgGzQWz1ShihODlrWasbQPTmSwSUmISLCvb/h8egDQs6OZ 80x4p7dvxr4xEmO0WQwnrhsJo/cXImW+agNSCkge/fFkwKLK5v0SLIMZyhtsXlA9fYWW VNSZYn1lXl8lagpCbhni1rKu4fjhf1QNz5/Lrb+qhxPEnyTIBxdckbw2VKNxW3Hz1CdK Vxpg== Received: by 10.50.170.72 with SMTP id ak8mr4294813igc.49.1352292086029; Wed, 07 Nov 2012 04:41:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.64.27.8 with HTTP; Wed, 7 Nov 2012 04:41:05 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Wed, 7 Nov 2012 18:11:05 +0530 Message-ID: Subject: Re: Spill file compression To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlouyvqUCL0TwoMqsz/8Su4ejuae+SsHPpx0FgZ3TmpvOz2tK6doIF11d1DGwe+/5FTcxel X-Virus-Checked: Checked by ClamAV on apache.org Yes we do compress each spill output using the same codec as specified for map (intermediate) output compression. However, the counted bytes may be counting decompressed values of the records written, and not post-compressed ones. On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann wrote: > Hi guys, > > I've encountered a situation where the ratio between "Map output bytes" and > "Map output materialized bytes" is quite huge and during the map-phase data > is spilled to disk quite a lot. This is something I'll try to optimize, but > I'm wondering if the spill files are compressed at all. I set > mapred.compress.map.output=true and > mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec > and everything else seems to be working correctly. Does Hadoop actually > compress spills or just the final spill after finishing the entire map-task? > > Thanks, > Sigurd -- Harsh J