hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <>
Subject [jira] Created: (HIVE-461) Optimize RCFile reading by using column pruning results
Date Thu, 30 Apr 2009 01:50:30 GMT
Optimize RCFile reading by using column pruning results

                 Key: HIVE-461
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 0.4.0
            Reporter: Zheng Shao
            Assignee: He Yongqiang

RCFile is a column-based file format introduced in HIVE-352. Column-based storage has shown
better compression ratio. On our internal data set (30 columns, most of them are short integer
strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed SequenceFile.

RCFIle also has the potential to improve the reading efficiency a lot since it compresses
each column separately.

We should integrate RCFile with the column pruning results from Hive to make the reading faster.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message