hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carl Steinbach (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (HIVE-2350) Improve RCFile Read Speed
Date Mon, 22 Aug 2011 20:02:29 GMT

     [ https://issues.apache.org/jira/browse/HIVE-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Carl Steinbach reopened HIVE-2350:
----------------------------------


@Tim: Yes, looks like closing this was a mistake on my part. Your latest patch looks good,
but you forgot to click the box that gives license rights to the ASF. Can you please attach
the patch again and this time click the box? Thanks.

> Improve RCFile Read Speed
> -------------------------
>
>                 Key: HIVE-2350
>                 URL: https://issues.apache.org/jira/browse/HIVE-2350
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: rcfile-2011-08-04.diff, rcfile_opt_2011-08-05.diff, rcfile_opt_2011-08-05b.diff,
rcfile_opt_2011-08-11.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> By tweaking the RCFile$Reader implementation to allow more efficient memory access I
was able to reduce CPU usage.  I measured the speed required to scan a gzipped RCFile, decompress
and assemble into records.  CPU time was reduced by about 7% for a full table scan,  An improvement
of about 2% was realised when a smaller subset of columns (3-5 out of tens) were selected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message