hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vaibhav Aggarwal (JIRA)" <>
Subject [jira] [Commented] (HIVE-2266) Fix compression parameters
Date Fri, 02 Sep 2011 03:32:09 GMT


Vaibhav Aggarwal commented on HIVE-2266:

This patch attempts to fix a bug in the existing functionality in two ways:

1. In, wrong jobconf is getting passed which is clear from the context.

2. In other cases the compression parameters are not getting set.

The only difference this patch produces from the current behavior is smaller file sizes on
file system. I am not sure how to write a hive query which can verify difference in file sizes.
Do you have any ideas which can help me add some quick tests for this? The current test executes
though the code checking that it does not result in any Exception or Error. It does not compare
file size.

> Really? Which platforms are you talking about? Can you tell me how to reproduce this
interesting behavior?

Hadoop loads native compression libraries. I believe that they are platform dependent hence
I do not assume that they always have same compression ratio. Please correct me if I am wrong

In any case I think this is a broken existing functionality in Hive which we should fix.

> Fix compression parameters
> --------------------------
>                 Key: HIVE-2266
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-2266-2.patch, HIVE-2266.patch
> There are a number of places where compression values are not set correctly in FileSinkOperator.
This results in uncompressed files.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message