hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Zhou (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance
Date Mon, 09 Aug 2010 16:55:16 GMT

     [ https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yan Zhou updated PIG-1501:
--------------------------

    Attachment: compress_perf_data_2.txt

The data set in the last tests are small such that the performance difference was lost in
background noise.  This test case generates more temporary data.

In summary, lzo generates about 3% compression ration and sees 4x  speed improvement than
uncompressed;  gzip generates less than 1% compress ratio but the speed is 1%-2% slower than
uncompressed. This observation is in line with the general observation that gzip compresses
better but performs worse.

> need to investigate the impact of compression on pig performance
> ----------------------------------------------------------------
>
>                 Key: PIG-1501
>                 URL: https://issues.apache.org/jira/browse/PIG-1501
>             Project: Pig
>          Issue Type: Test
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>         Attachments: compress_perf_data.txt, compress_perf_data_2.txt
>
>
> We would like to understand how compressing map results as well as well as reducer output
in a chain of MR jobs impacts performance. We can use PigMix queries for this investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message