Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3089CD0AD for ; Wed, 17 Oct 2012 04:32:18 +0000 (UTC) Received: (qmail 93234 invoked by uid 500); 17 Oct 2012 04:32:13 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 92871 invoked by uid 500); 17 Oct 2012 04:32:10 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 92839 invoked by uid 99); 17 Oct 2012 04:32:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 04:32:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lohit.vijayarenu@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 04:32:01 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so12979668iea.35 for ; Tue, 16 Oct 2012 21:31:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vKeh8aZWx2KsStUzoOVWRdaDY3e9J9uqOFRit+uSIAw=; b=JS1TBrSKXN5iQ6RUiqaPK2sRJC1WMsmDRbO2+7e+bgEQNLbUh2qfm3TKB8gPwVaG+r CDry3SmR/7SxU8W5l6TWt7C9lS+z5P2eTxAe3VoXT4Bjh8E8STz/yFSlkHdxjN4tpMhO KeFqqdNrMVH1VJ9IahgMMveHdzxCjZxb2F1lDq+vlQ6pZGA/wlGIPiDGWKRX5TB0YA90 6sLvtN20XUoprCsGlENNBcmzDu+YJRbpKxWTPCmv0b8gcDVUlc6zHfk0zlGK9aky+GN4 NwuBJo7Yj53j403RZBF1Ge79URj9dKkC7R7gOHg9lix6LBmfQZDxWSDj5J8IQd+wSMA1 ydGQ== MIME-Version: 1.0 Received: by 10.42.35.136 with SMTP id q8mr13449001icd.11.1350448300441; Tue, 16 Oct 2012 21:31:40 -0700 (PDT) Received: by 10.64.103.161 with HTTP; Tue, 16 Oct 2012 21:31:40 -0700 (PDT) In-Reply-To: References: Date: Tue, 16 Oct 2012 21:31:40 -0700 Message-ID: Subject: Re: Reg LZO compression From: lohit To: user@hadoop.apache.org, rdyer@iastate.edu Content-Type: multipart/alternative; boundary=90e6ba1efbaa3c404f04cc39bfd8 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba1efbaa3c404f04cc39bfd8 Content-Type: text/plain; charset=UTF-8 As Robert said, If you job is mainly IO intensive and CPU are idle, then having lzo would improve your overal job performance. In your case it looks like the job you are running is not IO bound and seems to take up CPU in compressing/decompressing the data. It also depends on the kind of data. Some dataset might not be compressible (eg random data) , in those cases you would end up wasting CPU cycles and it is better to turn off compression for such jobs. 2012/10/16 Robert Dyer > Hi Manoj, > > If the data is the same for both tests and the number of mappers is > fewer, then each mapper has more (uncompressed) data to process. Thus > each mapper should take longer and overall execution time should > increase. > > As a simple example: if your data is 128MB uncompressed it may use 2 > mappers, each processing 64MB of data (1 HDFS block per map task). > However, if you compress the data and it is now say 60MB, then one map > task will get the entire input file, decompress the data (to 128MB), > and process it. > > On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu wrote: > > Hi All, > > > > When using lzo compression the file size drastically reduced and the no > of > > mappers is reduced but the overall execution time is increased, I assume > > that because mappers deals with same amount of data. > > > > Is this the expected behavior? > > > > Cheers! > > Manoj. > > > -- Have a Nice Day! Lohit --90e6ba1efbaa3c404f04cc39bfd8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As Robert said, If you job is mainly IO intensive and CPU are idle, then ha= ving lzo would improve your overal job performance.
In your case it loo= ks like the job you are running is not IO bound and seems to take up CPU in= compressing/decompressing the data.
It also depends on the kind of data. Some dataset might not be compres= sible (eg random data) , in those cases you would end up wasting CPU cycles= and it is better to turn off compression for such jobs.

2012/10/16 Robert Dyer <psybers@gmail.com>
Hi Manoj,

If the data is the same for both tests and the number of mappers is
fewer, then each mapper has more (uncompressed) data to process. =C2=A0Thus=
each mapper should take longer and overall execution time should
increase.

As a simple example: if your data is 128MB uncompressed it may use 2
mappers, each processing 64MB of data (1 HDFS block per map task).
However, if you compress the data and it is now say 60MB, then one map
task will get the entire input file, decompress the data (to 128MB),
and process it.

On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu <manoj444@gmail.com> wrote:
> Hi All,
>
> When using lzo compression the file size drastically reduced and the n= o of
> mappers is reduced but the overall execution time is increased, I assu= me
> that because mappers deals with same amount of data.
>
> Is this the expected behavior?
>
> Cheers!
> Manoj.
>



--
= Have a Nice Day!
Lohit
--90e6ba1efbaa3c404f04cc39bfd8--