hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-819) Add lazy decompress ability to RCFile
Date Sat, 12 Sep 2009 20:16:57 GMT

     [ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

He Yongqiang updated HIVE-819:
------------------------------

    Attachment: hive-819-2009-9-12.patch

A draft version adding a call back to do lazy decompression. Need to do more profile.
One experiment on a 115M compressed input file uservisits,
"SELECT sourceip, desturl, visitdate, useragent, countrycode, duration FROM uservisits_rc
where duration >9;" was reduced from 20+s to 14seconds.
However, after changing filter condition from 9 to 8, the execution time is increased by 4s.
That's too bad, and need to do more profile to find out.

> Add lazy decompress ability to RCFile
> -------------------------------------
>
>                 Key: HIVE-819
>                 URL: https://issues.apache.org/jira/browse/HIVE-819
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: He Yongqiang
>             Fix For: 0.5.0
>
>         Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where a>1;' we
only need to decompress the block data of b,c columns when one row's column 'a' in that block
satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message