hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-461) Optimize RCFile reading by using column pruning results
Date Thu, 30 Apr 2009 01:50:30 GMT
Optimize RCFile reading by using column pruning results
-------------------------------------------------------

                 Key: HIVE-461
                 URL: https://issues.apache.org/jira/browse/HIVE-461
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 0.4.0
            Reporter: Zheng Shao
            Assignee: He Yongqiang


RCFile is a column-based file format introduced in HIVE-352. Column-based storage has shown
better compression ratio. On our internal data set (30 columns, most of them are short integer
strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed SequenceFile.

RCFIle also has the potential to improve the reading efficiency a lot since it compresses
each column separately.

We should integrate RCFile with the column pruning results from Hive to make the reading faster.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message