hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <>
Subject [jira] Resolved: (HIVE-461) Optimize RCFile reading by using column pruning results
Date Mon, 06 Jul 2009 01:07:14 GMT


Zheng Shao resolved HIVE-461.

       Resolution: Fixed
    Fix Version/s: 0.4.0
     Release Note: HIVE-461. Optimize RCFile reading by using column pruning results. (Yongqiang
He via zshao)
     Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang.

The next time you generate a patch, please use "ant test ... -Doverwrite=true" to generate
the corrected test case results. This should avoid changing some of the test results unnecessarily.

> Optimize RCFile reading by using column pruning results
> -------------------------------------------------------
>                 Key: HIVE-461
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Yongqiang He
>             Fix For: 0.4.0
>         Attachments: hive-461-2009-05-26.patch, hive-461-2009-06-26.patch, hive-461-2009-06-27.patch,
> RCFile is a column-based file format introduced in HIVE-352. Column-based storage has
shown better compression ratio. On our internal data set (30 columns, most of them are short
integer strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed
> RCFIle also has the potential to improve the reading efficiency a lot since it compresses
each column separately.
> We should integrate RCFile with the column pruning results from Hive to make the reading

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message