hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HIVE-461) Optimize RCFile reading by using column pruning results
Date Mon, 06 Jul 2009 01:07:14 GMT

     [ https://issues.apache.org/jira/browse/HIVE-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zheng Shao resolved HIVE-461.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.4.0
     Release Note: HIVE-461. Optimize RCFile reading by using column pruning results. (Yongqiang
He via zshao)
     Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang.

The next time you generate a patch, please use "ant test ... -Doverwrite=true" to generate
the corrected test case results. This should avoid changing some of the test results unnecessarily.


> Optimize RCFile reading by using column pruning results
> -------------------------------------------------------
>
>                 Key: HIVE-461
>                 URL: https://issues.apache.org/jira/browse/HIVE-461
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Yongqiang He
>             Fix For: 0.4.0
>
>         Attachments: hive-461-2009-05-26.patch, hive-461-2009-06-26.patch, hive-461-2009-06-27.patch,
hive-461-2009-07-04.patch
>
>
> RCFile is a column-based file format introduced in HIVE-352. Column-based storage has
shown better compression ratio. On our internal data set (30 columns, most of them are short
integer strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed
SequenceFile.
> RCFIle also has the potential to improve the reading efficiency a lot since it compresses
each column separately.
> We should integrate RCFile with the column pruning results from Hive to make the reading
faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message