hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Illya Yalovyy <yalov...@amazon.com>
Subject Review Request 50816: HIVE-7239 Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files
Date Thu, 04 Aug 2016 19:40:58 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50816/
-----------------------------------------------------------

Review request for hive, Ashutosh Chauhan and Gopal V.


Repository: hive-git


Description
-------

HIVE-7239 Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result
when input backed by Sequence/RC files

In case of sequence files, it's crucial that splits are calculated around the boundaries enforced
by the input sequence file. However by default hadoop creates input splits depending on the
configuration parameters which may not match the boundaries for the input sequence file. Hive
provides HiveIndexedInputFormat that provides extra logic and recalculates the split boundaries
for each split depending on the sequence file's boundaries.

However we noticed this behavior of "over" reporting from data backed by sequence file. We've
a sample data on which we experimented and fixed this bug, we have verified this fix by comparing
the query output for input being sequence file format, rc file and regular format. 

https://issues.apache.org/jira/browse/HIVE-7239


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java 33cc5c3 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 5247ece 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexResult.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/SplitFilter.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/MockHiveInputSplits.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/MockIndexResult.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/MockInputFile.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/SplitFilterTestCase.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/TestHiveInputSplitComparator.java PRE-CREATION

  ql/src/test/org/apache/hadoop/hive/ql/index/TestSplitFilter.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50816/diff/


Testing
-------

Manually tested on a cluster.

HiveQA:

Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/674/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/674/console
Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-674/


Thanks,

Illya Yalovyy


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message