Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Content-Type: multipart/alternative;
	boundary="===============1954507063854806603=="
MIME-Version: 1.0
Subject: Re: Review Request: Use sorted nature of compact indexes
From: "Kevin Wilfong" <kevinwilfong@fb.com>
To: "Ning Zhang" <nzhang@fb.com>, "Yongqiang He" <heyongqiangict@gmail.com>,
	"namit jain" <njain@fb.com>
Date: Tue, 01 Nov 2011 23:56:58 -0000
Message-ID: <20111101235658.956.61909@reviews.apache.org>
Cc: "Kevin Wilfong" <kevinwilfong@fb.com>,"hive" <dev@hive.apache.org>
In-Reply-To: <20111029013950.31049.53946@reviews.apache.org>
References: <20111029013950.31049.53946@reviews.apache.org>

--===============1954507063854806603==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2605/
-----------------------------------------------------------

(Updated 2011-11-01 23:56:58.062678)


Review request for hive, Yongqiang He, Ning Zhang, and namit jain.


Changes
-------

Thanks for the feedback Namit.  This diff should address all of your commen=
ts that I haven't already addressed with my own comments.

The most notable change in this diff is that the BinarySearchRecordReader c=
an now be used for both HiveInputFormat and CombineHiveInputFormat.  Their =
respective record readers now inherit from BinarySearchRecordReader, and th=
e CompactIndexHandler sets a variable representing whether or not the Binar=
ySearchRecordReader should attempt to run its optimized search algorithm on=
 the data.  If that variable is not set, the record readers behave exactly =
as they did before.

I updated the tests I added and verified they passed.  I also ran the tests=
 most likely to be affected, notably any tests related to indexes, and veri=
fied they still passed.


Summary
-------

The CompactIndexHandler determines if the reentrant query it creates is a c=
andidate for using the fact the index is sorted (it has an appropriate numb=
er of non-partition conditions, and the query plan is of the form expected)=
.  It sets the input format to HiveSortedInputFormat, and marks the FilterO=
perator for the non-partition condition.

The HiveSortedInputFormat is extends HiveInputFormat, so its splits consist=
 of data from a single file, and its record reader is HiveBinarySearchRecor=
dReader.  HiveBinarySearchRecordReader starts by assuming it is performing =
a binary search.  It sets the appropriate flags in IOContext, which acts as=
 the means of communication between the FilterOperators and the record read=
er.  The non-partition FilterOperator is responsible for executing a compar=
ison between the value in the row and column of interest and the constant. =
 It also provides the type of the generic UDF.  It sets this data in the IO=
Context.  As long as the binary search continues the FilterOperators do not=
 forward rows to the operators below them.  The record reader uses the comp=
arison and the type of the generic UDF to execute a binary search on the un=
derlying RCFile until it finds the block of interest, or determines that if=
 any block is of interest it is the last one.  The search then proceeds lin=
early from the beginning of the identified block.  If ever in the binary se=
arch a problem occurs, like the comparison fails for some reason, a linear =
search begins from the beginning of the data which has yet to be eliminated.

Regardless of whether or not a binary search is performed, the record reade=
r attempts to end the linear search as soon as it can based on the comparis=
on and the type of the generic UDF.


This addresses bug HIVE-2535.
    https://issues.apache.org/jira/browse/HIVE-2535


Diffs (updated)
-----

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1183507 =

  trunk/conf/hive-default.xml 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvalu=
ator.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java 1183=
507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHan=
dler.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.ja=
va 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinarySearchRecordRead=
er.java PRE-CREATION =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 11835=
07 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveRecordReader.java 1183=
507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 11=
83507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1183507 =

  trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCom=
pare.java 1183507 =

  trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyHiveSortedInputFo=
rmatUsedHook.java PRE-CREATION =

  trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecord=
Reader.java PRE-CREATION =

  trunk/ql/src/test/queries/clientpositive/index_compact_binary_search.q PR=
E-CREATION =

  trunk/ql/src/test/results/clientpositive/index_compact_binary_search.q.ou=
t PRE-CREATION =


Diff: https://reviews.apache.org/r/2605/diff


Testing
-------

I added a test to verify the functionality of the HiveBinarySearchRecordRea=
der.

I also added a .q file to test that this returns the correct results when t=
he underlying index is stored in an RCFile and when it is stored in as a te=
xt file, with all of the supported operators.

I ran the .q files to verify they still pass.

I ran some queries to verify there was a CPU benefit to doing this.  I saw =
as much as a 45% reduction in the total CPU used by the map reduce job to s=
can the index, for a large data set. =


Thanks,

Kevin


--===============1954507063854806603==--