hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-819) Add lazy decompress ability to RCFile
Date Thu, 17 Sep 2009 18:21:57 GMT

    [ https://issues.apache.org/jira/browse/HIVE-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756674#action_12756674
] 

Ning Zhang commented on HIVE-819:
---------------------------------

A few general comments: 

 1) Can you briefly summarize the current approach of how decompression is done and the your
proposal to the lazy decompression? Also more comments in the code would be much helpful.

 2) Does the performance regression by 4 secs with the query predicate duration > 8 consistent
or intermittent? If it is the former is there any additional changes that causes this regression
(I thought the worst case would be decompress all columns, as you mentioned, which is equivalent
to the previous behavior?). If the latter, what method of timing are you using? If you have
YourKit can your also do CPU profiling? 

> Add lazy decompress ability to RCFile
> -------------------------------------
>
>                 Key: HIVE-819
>                 URL: https://issues.apache.org/jira/browse/HIVE-819
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>
>         Attachments: hive-819-2009-9-12.patch
>
>
> This is especially useful for a filter scanning. 
> For example, for query 'select a, b, c from table_rc_lazydecompress where a>1;' we
only need to decompress the block data of b,c columns when one row's column 'a' in that block
satisfies the filter condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message