hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland" <>
Subject Re: Review Request 11770: HIVE-4113: Optimize select count(1) with RCFile and Orc
Date Mon, 15 Jul 2013 19:47:43 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated July 15, 2013, 7:47 p.m.)

Review request for hive.


Rebased patch, no real changes.

Bugs: HIVE-4113

Repository: hive-git


Modifies ColumnProjectionUtils such there are two flags. One for the column ids and one indicating
whether all columns should be read. Additionally the patch updates all locations which uses
the old method of empty string indicating all columns should be read.

The automatic formatter generated by ant eclipse-files is fairly aggressive so there are some
unrelated import/whitespace cleanup.

Diffs (updated)

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/ da85501

  hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/ bc0e04c

  hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/ ac3753f

  hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/ 02ec37f 
  hcatalog/core/src/main/java/org/apache/hcatalog/mapreduce/ 4167afa 
  hcatalog/core/src/test/java/org/apache/hcatalog/mapreduce/ dd2ac10

  ql/src/java/org/apache/hadoop/hive/ql/exec/ 1a784b2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ f72ecfb 
  ql/src/java/org/apache/hadoop/hive/ql/io/ 49145b7 
  ql/src/java/org/apache/hadoop/hive/ql/io/ adf4923 
  ql/src/java/org/apache/hadoop/hive/ql/io/ d18d403 
  ql/src/java/org/apache/hadoop/hive/ql/io/ 9521060 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ 96ac584 
  ql/src/test/org/apache/hadoop/hive/ql/ 400abf3 
  ql/src/test/org/apache/hadoop/hive/ql/io/ fb9fca1 
  ql/src/test/org/apache/hadoop/hive/ql/io/ ae6a5ee 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/ 785f0b1 
  serde/src/java/org/apache/hadoop/hive/serde2/ 23180cf 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ 11f5f07 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ 1335446 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ e1270cc 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ b717278

  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ 0317024

  serde/src/test/org/apache/hadoop/hive/serde2/ 3ba2699 
  serde/src/test/org/apache/hadoop/hive/serde2/columnar/ 99420ca



All unit tests pass with the patch. ColumnProjectionUtils has new unit tests covering it's
functionality. Additionally I verified manually the select count(1) from RCFile/Orc resulted
in less IO after the change.


hive> select count(1) from users_orc;
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 17.75 sec   HDFS Read: 28782851 HDFS Write: 9 SUCCESS

hive> select count(1) from users_rc; 
Job 0: Map: 3  Reduce: 1   Cumulative CPU: 23.72 sec   HDFS Read: 825865962 HDFS Write: 9


hive> select count(1) from users_orc;
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 9.9 sec   HDFS Read: 67325 HDFS Write: 9 SUCCESS

hive> select count(1) from users_rc; 
Job 0: Map: 3  Reduce: 1   Cumulative CPU: 16.96 sec   HDFS Read: 96045618 HDFS Write: 9 SUCCESS


Brock Noland

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message