Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of manoj444@gmail.com designates
 209.85.160.48 as permitted sender)
MIME-Version: 1.0
Date: Thu, 18 Oct 2012 23:03:02 +0530
Message-ID: 
 <CACvfG=oNwszCLxNfh33sReOM+cjj32X1wHjH_O_iFR237iJLPg@mail.gmail.com>
Subject: Re: Reg LZO compression
From: Manoj Babu <manoj444@gmail.com>
To: user@hadoop.apache.org
Cc: rdyer@iastate.edu
Content-Type: multipart/alternative; boundary=047d7b2ed55b7e7a0c04cc58c73a

--047d7b2ed55b7e7a0c04cc58c73a
Content-Type: text/plain; charset=ISO-8859-1

Thank you Robert and Lohit for providing the info.

In my cause using Text input format am reading a line but emitting it two
times.
On 17 Oct 2012 10:02, "lohit" <lohit.vijayarenu@gmail.com> wrote:
>
> As Robert said, If you job is mainly IO intensive and CPU are idle, then
having lzo would improve your overal job performance.
> In your case it looks like the job you are running is not IO bound and
seems to take up CPU in compressing/decompressing the data.
> It also depends on the kind of data. Some dataset might not be
compressible (eg random data) , in those cases you would end up wasting CPU
cycles and it is better to turn off compression for such jobs.
>
>
> 2012/10/16 Robert Dyer <psybers@gmail.com>
>>
>> Hi Manoj,
>>
>> If the data is the same for both tests and the number of mappers is
>> fewer, then each mapper has more (uncompressed) data to process.  Thus
>> each mapper should take longer and overall execution time should
>> increase.
>>
>> As a simple example: if your data is 128MB uncompressed it may use 2
>> mappers, each processing 64MB of data (1 HDFS block per map task).
>> However, if you compress the data and it is now say 60MB, then one map
>> task will get the entire input file, decompress the data (to 128MB),
>> and process it.
>>
>> On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu <manoj444@gmail.com> wrote:
>> > Hi All,
>> >
>> > When using lzo compression the file size drastically reduced and the
no of
>> > mappers is reduced but the overall execution time is increased, I
assume
>> > that because mappers deals with same amount of data.
>> >
>> > Is this the expected behavior?
>> >
>> > Cheers!
>> > Manoj.
>> >
>
>
>
>
> --
> Have a Nice Day!
> Lohit

--047d7b2ed55b7e7a0c04cc58c73a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>Thank you Robert and Lohit for providing the info.</p>
<p>In my cause using Text input format am reading a line but emitting it tw=
o times.<br>
On 17 Oct 2012 10:02, &quot;lohit&quot; &lt;<a href=3D"mailto:lohit.vijayar=
enu@gmail.com">lohit.vijayarenu@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; As Robert said, If you job is mainly IO intensive and CPU are idle, th=
en having lzo would improve your overal job performance.<br>
&gt; In your case it looks like the job you are running is not IO bound and=
 seems to take up CPU in compressing/decompressing the data.<br>
&gt; It also depends on the kind of data. Some dataset might not be compres=
sible (eg random data) , in those cases you would end up wasting CPU cycles=
 and it is better to turn off compression for such jobs.<br>
&gt;<br>
&gt;<br>
&gt; 2012/10/16 Robert Dyer &lt;<a href=3D"mailto:psybers@gmail.com">psyber=
s@gmail.com</a>&gt;<br>
&gt;&gt;<br>
&gt;&gt; Hi Manoj,<br>
&gt;&gt;<br>
&gt;&gt; If the data is the same for both tests and the number of mappers i=
s<br>
&gt;&gt; fewer, then each mapper has more (uncompressed) data to process. =
=A0Thus<br>
&gt;&gt; each mapper should take longer and overall execution time should<b=
r>
&gt;&gt; increase.<br>
&gt;&gt;<br>
&gt;&gt; As a simple example: if your data is 128MB uncompressed it may use=
 2<br>
&gt;&gt; mappers, each processing 64MB of data (1 HDFS block per map task).=
<br>
&gt;&gt; However, if you compress the data and it is now say 60MB, then one=
 map<br>
&gt;&gt; task will get the entire input file, decompress the data (to 128MB=
),<br>
&gt;&gt; and process it.<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu &lt;<a href=3D"mailto:=
manoj444@gmail.com">manoj444@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt; Hi All,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; When using lzo compression the file size drastically reduced =
and the no of<br>
&gt;&gt; &gt; mappers is reduced but the overall execution time is increase=
d, I assume<br>
&gt;&gt; &gt; that because mappers deals with same amount of data.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Is this the expected behavior?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Cheers!<br>
&gt;&gt; &gt; Manoj.<br>
&gt;&gt; &gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; -- <br>
&gt; Have a Nice Day!<br>
&gt; Lohit<br>
</p>

--047d7b2ed55b7e7a0c04cc58c73a--