hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongqiang He (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-461) Optimize RCFile reading by using column pruning results
Date Sat, 27 Jun 2009 02:56:47 GMT

     [ https://issues.apache.org/jira/browse/HIVE-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yongqiang He updated HIVE-461:
------------------------------

    Attachment: hive-461-2009-06-27.patch

Patched a new file integrating Zheng's suggestions.
However, List<Integer>neededColumnIDs is not moved from table scan operator to its desc.
tableScanDesc can not be instantiated, 
and even though i forcibly added it in, there are queries failed. 

> Optimize RCFile reading by using column pruning results
> -------------------------------------------------------
>
>                 Key: HIVE-461
>                 URL: https://issues.apache.org/jira/browse/HIVE-461
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Yongqiang He
>         Attachments: hive-461-2009-05-26.patch, hive-461-2009-06-26.patch, hive-461-2009-06-27.patch
>
>
> RCFile is a column-based file format introduced in HIVE-352. Column-based storage has
shown better compression ratio. On our internal data set (30 columns, most of them are short
integer strings), we are seeing gzip-compressed RCFile to be 20%+ smaller than gzip-compressed
SequenceFile.
> RCFIle also has the potential to improve the reading efficiency a lot since it compresses
each column separately.
> We should integrate RCFile with the column pruning results from Hive to make the reading
faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message