hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "He Yongqiang (JIRA)" <>
Subject [jira] Updated: (HIVE-819) Add lazy decompress ability to RCFile
Date Sat, 12 Sep 2009 20:16:57 GMT


He Yongqiang updated HIVE-819:

    Attachment: hive-819-2009-9-12.patch

A draft version adding a call back to do lazy decompression. Need to do more profile.
One experiment on a 115M compressed input file uservisits,
"SELECT sourceip, desturl, visitdate, useragent, countrycode, duration FROM uservisits_rc
where duration >9;" was reduced from 20+s to 14seconds.
However, after changing filter condition from 9 to 8, the execution time is increased by 4s.
That's too bad, and need to do more profile to find out.

> Add lazy decompress ability to RCFile
> -------------------------------------
>                 Key: HIVE-819
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: He Yongqiang
>             Fix For: 0.5.0
>         Attachments: hive-819-2009-9-12.patch
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where a>1;' we
only need to decompress the block data of b,c columns when one row's column 'a' in that block
satisfies the filter condition.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message