Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D0972D11B for ; Thu, 18 Oct 2012 17:33:43 +0000 (UTC) Received: (qmail 69863 invoked by uid 500); 18 Oct 2012 17:33:38 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 69109 invoked by uid 500); 18 Oct 2012 17:33:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69070 invoked by uid 99); 18 Oct 2012 17:33:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 17:33:32 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of manoj444@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 17:33:24 +0000 Received: by mail-pb0-f48.google.com with SMTP id wy7so10415554pbc.35 for ; Thu, 18 Oct 2012 10:33:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=K7mirxkcH+r8lNWqvhJxjbST5JZ1X2DRkqGMochS9z4=; b=rvEV6aYSeYsJiPk47+cbQ1meEEd0zBXuyKAVfNro7ezGpJuMwZpAOg9Vqo9je6QuTW nxVsiQOtdw4xXjQ5mLNHOfl+gX+gfxq+fe9FlfoPAOISwm99DU+Rcw0hTMNPooOv+ls1 ZjLzl8dxAJTKV1ZNx9zIRlSzYpm/3O4nEIALEuFMynkTQTOtY7gVEeqGGn6I+oQsbjg6 KbZK3gGJaJUmHB0LY0aaPcFwN9JdMdpnQfjKdN8vUjpJU+8+GiKu0O2VZC6RTlAjH/I9 DbO3bDLVHN+yJgF7QxzaRcDAAxYfXs1Ngwv2+idFxpCL7p3T9QlF0px2UJWPSxNsmK65 RWIg== MIME-Version: 1.0 Received: by 10.68.225.5 with SMTP id rg5mr67481495pbc.73.1350581582985; Thu, 18 Oct 2012 10:33:02 -0700 (PDT) Received: by 10.66.234.70 with HTTP; Thu, 18 Oct 2012 10:33:02 -0700 (PDT) Received: by 10.66.234.70 with HTTP; Thu, 18 Oct 2012 10:33:02 -0700 (PDT) Date: Thu, 18 Oct 2012 23:03:02 +0530 Message-ID: Subject: Re: Reg LZO compression From: Manoj Babu To: user@hadoop.apache.org Cc: rdyer@iastate.edu Content-Type: multipart/alternative; boundary=047d7b2ed55b7e7a0c04cc58c73a X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2ed55b7e7a0c04cc58c73a Content-Type: text/plain; charset=ISO-8859-1 Thank you Robert and Lohit for providing the info. In my cause using Text input format am reading a line but emitting it two times. On 17 Oct 2012 10:02, "lohit" wrote: > > As Robert said, If you job is mainly IO intensive and CPU are idle, then having lzo would improve your overal job performance. > In your case it looks like the job you are running is not IO bound and seems to take up CPU in compressing/decompressing the data. > It also depends on the kind of data. Some dataset might not be compressible (eg random data) , in those cases you would end up wasting CPU cycles and it is better to turn off compression for such jobs. > > > 2012/10/16 Robert Dyer >> >> Hi Manoj, >> >> If the data is the same for both tests and the number of mappers is >> fewer, then each mapper has more (uncompressed) data to process. Thus >> each mapper should take longer and overall execution time should >> increase. >> >> As a simple example: if your data is 128MB uncompressed it may use 2 >> mappers, each processing 64MB of data (1 HDFS block per map task). >> However, if you compress the data and it is now say 60MB, then one map >> task will get the entire input file, decompress the data (to 128MB), >> and process it. >> >> On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu wrote: >> > Hi All, >> > >> > When using lzo compression the file size drastically reduced and the no of >> > mappers is reduced but the overall execution time is increased, I assume >> > that because mappers deals with same amount of data. >> > >> > Is this the expected behavior? >> > >> > Cheers! >> > Manoj. >> > > > > > > -- > Have a Nice Day! > Lohit --047d7b2ed55b7e7a0c04cc58c73a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Thank you Robert and Lohit for providing the info.

In my cause using Text input format am reading a line but emitting it tw= o times.
On 17 Oct 2012 10:02, "lohit" <lohit.vijayarenu@gmail.com> wrote:
>
> As Robert said, If you job is mainly IO intensive and CPU are idle, th= en having lzo would improve your overal job performance.
> In your case it looks like the job you are running is not IO bound and= seems to take up CPU in compressing/decompressing the data.
> It also depends on the kind of data. Some dataset might not be compres= sible (eg random data) , in those cases you would end up wasting CPU cycles= and it is better to turn off compression for such jobs.
>
>
> 2012/10/16 Robert Dyer <psyber= s@gmail.com>
>>
>> Hi Manoj,
>>
>> If the data is the same for both tests and the number of mappers i= s
>> fewer, then each mapper has more (uncompressed) data to process. = =A0Thus
>> each mapper should take longer and overall execution time should >> increase.
>>
>> As a simple example: if your data is 128MB uncompressed it may use= 2
>> mappers, each processing 64MB of data (1 HDFS block per map task).=
>> However, if you compress the data and it is now say 60MB, then one= map
>> task will get the entire input file, decompress the data (to 128MB= ),
>> and process it.
>>
>> On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu <manoj444@gmail.com> wrote:
>> > Hi All,
>> >
>> > When using lzo compression the file size drastically reduced = and the no of
>> > mappers is reduced but the overall execution time is increase= d, I assume
>> > that because mappers deals with same amount of data.
>> >
>> > Is this the expected behavior?
>> >
>> > Cheers!
>> > Manoj.
>> >
>
>
>
>
> --
> Have a Nice Day!
> Lohit

--047d7b2ed55b7e7a0c04cc58c73a--