hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongqiang He (JIRA)" <>
Subject [jira] Commented: (HIVE-461) Optimize RCFile reading by using column pruning results
Date Fri, 26 Jun 2009 05:53:07 GMT


Yongqiang He commented on HIVE-461:

please remove input20.q.out in the patch when test and commit.
input20.q.out always has wrong result in my local.

> Optimize RCFile reading by using column pruning results
> -------------------------------------------------------
>                 Key: HIVE-461
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Yongqiang He
>         Attachments: hive-461-2009-05-26.patch, hive-461-2009-06-26.patch
> RCFile is a column-based file format introduced in HIVE-352. Column-based storage has
shown better compression ratio. On our internal data set (30 columns, most of them are short
integer strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed
> RCFIle also has the potential to improve the reading efficiency a lot since it compresses
each column separately.
> We should integrate RCFile with the column pruning results from Hive to make the reading

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message