hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan" <hashut...@apache.org>
Subject Re: Review Request: Improve RCFile::sync(long) by 10x
Date Fri, 26 Apr 2013 15:13:49 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10795/#review19770
-----------------------------------------------------------

Ship it!


Ship It!

- Ashutosh Chauhan


On April 26, 2013, 11:25 a.m., Gopal V wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10795/
> -----------------------------------------------------------
> 
> (Updated April 26, 2013, 11:25 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gunther Hagleitner.
> 
> 
> Description
> -------
> 
> Speed up RCFile::sync() by reading large blocks of data from HDFS rather than using readByte()
on the input stream. 
> 
> This improves the loop behaviour and reduces the number of calls on the synchronized
read() methods within HDFS, resulting in a 10x performance boost to this function.
> 
> In real time, it converts a call that takes upto a second and brings it below 100ms,
by reading 512 byte chunks instead of reading data 1 byte at a time.
> 
> 
> This addresses bug HIVE-4423.
>     https://issues.apache.org/jira/browse/HIVE-4423
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d3d98d0 
> 
> Diff: https://reviews.apache.org/r/10795/diff/
> 
> 
> Testing
> -------
> 
> ant test -Dtestcase=TestRCFile -Dmodule=ql
> ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql
> 
> And benchmarking with count(1) on the store_sales rcfile table at scale=10
> 
> before: 43.8, after: 39.5 
> 
> 
> Thanks,
> 
> Gopal V
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message