hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Babu <manoj...@gmail.com>
Subject Re: Reg LZO compression
Date Thu, 18 Oct 2012 17:33:02 GMT
Thank you Robert and Lohit for providing the info.

In my cause using Text input format am reading a line but emitting it two
times.
On 17 Oct 2012 10:02, "lohit" <lohit.vijayarenu@gmail.com> wrote:
>
> As Robert said, If you job is mainly IO intensive and CPU are idle, then
having lzo would improve your overal job performance.
> In your case it looks like the job you are running is not IO bound and
seems to take up CPU in compressing/decompressing the data.
> It also depends on the kind of data. Some dataset might not be
compressible (eg random data) , in those cases you would end up wasting CPU
cycles and it is better to turn off compression for such jobs.
>
>
> 2012/10/16 Robert Dyer <psybers@gmail.com>
>>
>> Hi Manoj,
>>
>> If the data is the same for both tests and the number of mappers is
>> fewer, then each mapper has more (uncompressed) data to process.  Thus
>> each mapper should take longer and overall execution time should
>> increase.
>>
>> As a simple example: if your data is 128MB uncompressed it may use 2
>> mappers, each processing 64MB of data (1 HDFS block per map task).
>> However, if you compress the data and it is now say 60MB, then one map
>> task will get the entire input file, decompress the data (to 128MB),
>> and process it.
>>
>> On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu <manoj444@gmail.com> wrote:
>> > Hi All,
>> >
>> > When using lzo compression the file size drastically reduced and the
no of
>> > mappers is reduced but the overall execution time is increased, I
assume
>> > that because mappers deals with same amount of data.
>> >
>> > Is this the expected behavior?
>> >
>> > Cheers!
>> > Manoj.
>> >
>
>
>
>
> --
> Have a Nice Day!
> Lohit

Mime
View raw message