Return-Path: Delivered-To: apmail-hive-dev-archive@www.apache.org Received: (qmail 61901 invoked from network); 7 Apr 2011 21:40:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Apr 2011 21:40:34 -0000 Received: (qmail 36295 invoked by uid 500); 7 Apr 2011 21:40:34 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 36272 invoked by uid 500); 7 Apr 2011 21:40:34 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 36262 invoked by uid 99); 7 Apr 2011 21:40:34 -0000 Received: from reviews.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2011 21:40:34 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 91ECF1C0023; Thu, 7 Apr 2011 21:40:39 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5668110117774034355==" MIME-Version: 1.0 Subject: Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes From: "John Sichi" To: "Russell Melick" , "John Sichi" , "hive" Date: Thu, 07 Apr 2011 21:40:39 -0000 Message-ID: <20110407214039.18496.74647@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org X-ReviewRequest-URL: https://reviews.apache.org/r/558/ In-Reply-To: <20110407052722.28324.40886@reviews.apache.org> References: <20110407052722.28324.40886@reviews.apache.org> --===============5668110117774034355== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/558/#review399 ----------------------------------------------------------- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java For consistency with my review in HIVE-1694, I suggest hive.optimize.in= dex.filter as the name for this configuration parameter. = (In HIVE-1694 I suggested hive.optimize.index.groupby, and we want it t= o be possible to enable/disable them independently) = common/src/java/org/apache/hadoop/hive/conf/HiveConf.java In line with the previous comment, suggest hive.optimize.index.filter.c= ompact.minSize/maxSize. = Namit's suggestion for minSize was 5G. = I think the default for maxSize should be infinity (I can't think of a = case where we want it in effect by default). ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java HIVE-1803 is changing this to hive.index.blockfilter.file. Assuming th= at gets committed first, we should use that, since it's generic rather than= tied to the index type. ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java What are the units here? Also, don't use colon after parameter name. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java The non-functional changes in this file are gonna conflict with HIVE-18= 03, so get rid of them. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java Use HiveUtils.unparseIdentifier for quoting table names in generated SQ= L. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java Isn't it incorrect to set properties on the original table scan here si= nce this is only tentative? ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java Likewise, modifying inputs is incorrect before we have a definite plan. = Some more work on the new HiveIndexHandler interface method is required= for resolving this plus the residuals. ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java If searchConditions.size() =3D=3D 0, it means we didn't find anything w= hich could be handled by the index. In that case, we should bail out immed= iately and not try to do anything more with this index. = ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java We collect the residual here, but we don't do anything with it. Don't = we need to pass it back so that Hive can decide what to leave in the Filter= operator? = ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java The list actually contains index objects, not index table names. Also = typo: "is exists" ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java Only cast once. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWherePr= ocessor.java Indentation is wrong here. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWherePr= ocessor.java In my review for HIVE-1694, I noted that we should not be swallowing ex= ceptions. I think some of this code was copied from there. If we can't ac= cess the metastore during optimization, it should be treated as a fatal err= or. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWherePr= ocessor.java The plan still looks wrong (there are two Stage-0's, one for the index = scan, one for the final fetch), so the relabeling is still not quite workin= g correctly. = ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWherePr= ocessor.java no space after ! = ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWherePr= ocessor.java Suggested rename for method: arePartitionsCoveredByIndex ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWherePr= ocessor.java This checks that the metadata matches. But it does not actually check = that the index partitions exist. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTa= skDispatcher.java Is that true? Why couldn't index be used to optimize a map-only task? ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTa= skDispatcher.java As noted above, we DO want to fail here. ql/src/test/queries/clientpositive/index_opt_where.q Could you add more tests for cases where automatic indexing should deci= de not to kick in: = * index can't be used because of min/max size config * index can't be used because predicate isn't supported * index can't be used columns aren't covered = Also: = * case where multiple indexes apply and we pick one (currently arbitrar= ily, but make sure it's at least deterministic so that test doesn't become = flaky) = ql/src/test/queries/clientpositive/index_opt_where_partitioned.q Could you add a test which shows the index NOT being used in the case w= here required partitions haven't been built yet? = ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out Here's the duplicate Stage-0 I referred to in the code. = - John On 2011-04-07 05:27:22, Russell Melick wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/558/ > ----------------------------------------------------------- > = > (Updated 2011-04-07 05:27:22) > = > = > Review request for hive. > = > = > Summary > ------- > = > Review request for HIVE-1644.12.patch > = > = > This addresses bug HIVE-1644. > https://issues.apache.org/jira/browse/HIVE-1644 > = > = > Diffs > ----- > = > ql/src/test/results/clientpositive/index_opt_where_simple.q.out PRE-CRE= ATION = > ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out PR= E-CREATION = > ql/src/test/results/clientpositive/index_opt_where.q.out PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 = > ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION = > ql/src/test/queries/clientpositive/index_opt_where_partitioned.q PRE-CR= EATION = > ql/src/test/queries/clientpositive/index_opt_where_simple.q PRE-CREATIO= N = > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca= 84 = > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhe= reTaskDispatcher.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 = > ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 = > ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d9= 0b = > ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java d= d0186d = > ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b7= 8f = > ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler= .java 1f01446 = > ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c = > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 61= 62676 = > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereReso= lver.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimi= zer.java 0ae9fa2 = > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhe= reProcCtx.java PRE-CREATION = > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhe= reProcessor.java PRE-CREATION = > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 = > conf/hive-default.xml c42197f = > = > Diff: https://reviews.apache.org/r/558/diff > = > = > Testing > ------- > = > = > Thanks, > = > Russell > = > --===============5668110117774034355==--