hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi" <jsi...@fb.com>
Subject Re: Review Request: HIVE-1644 Use filter pushdown for automatically accessing indexes
Date Thu, 07 Apr 2011 21:40:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/558/#review399
-----------------------------------------------------------



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
<https://reviews.apache.org/r/558/#comment748>

    For consistency with my review in HIVE-1694, I suggest hive.optimize.index.filter as the
name for this configuration parameter.
    
    (In HIVE-1694 I suggested hive.optimize.index.groupby, and we want it to be possible to
enable/disable them independently)
    



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
<https://reviews.apache.org/r/558/#comment749>

    In line with the previous comment, suggest hive.optimize.index.filter.compact.minSize/maxSize.
    
    Namit's suggestion for minSize was 5G.
    
    I think the default for maxSize should be infinity (I can't think of a case where we want
it in effect by default).



ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
<https://reviews.apache.org/r/558/#comment750>

    HIVE-1803 is changing this to hive.index.blockfilter.file.  Assuming that gets committed
first, we should use that, since it's generic rather than tied to the index type.



ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java
<https://reviews.apache.org/r/558/#comment751>

    What are the units here?  Also, don't use colon after parameter name.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
<https://reviews.apache.org/r/558/#comment752>

    The non-functional changes in this file are gonna conflict with HIVE-1803, so get rid
of them.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
<https://reviews.apache.org/r/558/#comment755>

    Use HiveUtils.unparseIdentifier for quoting table names in generated SQL.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
<https://reviews.apache.org/r/558/#comment756>

    Isn't it incorrect to set properties on the original table scan here since this is only
tentative?



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
<https://reviews.apache.org/r/558/#comment757>

    Likewise, modifying inputs is incorrect before we have a definite plan.
    
    Some more work on the new HiveIndexHandler interface method is required for resolving
this plus the residuals.



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
<https://reviews.apache.org/r/558/#comment753>

    If searchConditions.size() == 0, it means we didn't find anything which could be handled
by the index.  In that case, we should bail out immediately and not try to do anything more
with this index.
    



ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
<https://reviews.apache.org/r/558/#comment759>

    We collect the residual here, but we don't do anything with it.  Don't we need to pass
it back so that Hive can decide what to leave in the Filter operator?
    



ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
<https://reviews.apache.org/r/558/#comment760>

    The list actually contains index objects, not index table names.  Also typo: "is exists"



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
<https://reviews.apache.org/r/558/#comment761>

    Only cast once.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
<https://reviews.apache.org/r/558/#comment764>

    Indentation is wrong here.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
<https://reviews.apache.org/r/558/#comment763>

    In my review for HIVE-1694, I noted that we should not be swallowing exceptions.  I think
some of this code was copied from there.  If we can't access the metastore during optimization,
it should be treated as a fatal error.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
<https://reviews.apache.org/r/558/#comment765>

    The plan still looks wrong (there are two Stage-0's, one for the index scan, one for the
final fetch), so the relabeling is still not quite working correctly.
    



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
<https://reviews.apache.org/r/558/#comment766>

    no space after !
    



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
<https://reviews.apache.org/r/558/#comment767>

    Suggested rename for method:  arePartitionsCoveredByIndex



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
<https://reviews.apache.org/r/558/#comment768>

    This checks that the metadata matches.  But it does not actually check that the index
partitions exist.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
<https://reviews.apache.org/r/558/#comment769>

    Is that true?  Why couldn't index be used to optimize a map-only task?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
<https://reviews.apache.org/r/558/#comment770>

    As noted above, we DO want to fail here.



ql/src/test/queries/clientpositive/index_opt_where.q
<https://reviews.apache.org/r/558/#comment772>

    Could you add more tests for cases where automatic indexing should decide not to kick
in:
    
    * index can't be used because of min/max size config
    * index can't be used because predicate isn't supported
    * index can't be used columns aren't covered
    
    Also:
    
    * case where multiple indexes apply and we pick one (currently arbitrarily, but make sure
it's at least deterministic so that test doesn't become flaky)
    



ql/src/test/queries/clientpositive/index_opt_where_partitioned.q
<https://reviews.apache.org/r/558/#comment771>

    Could you add a test which shows the index NOT being used in the case where required partitions
haven't been built yet?
    



ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out
<https://reviews.apache.org/r/558/#comment773>

    Here's the duplicate Stage-0 I referred to in the code.
    


- John


On 2011-04-07 05:27:22, Russell Melick wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/558/
> -----------------------------------------------------------
> 
> (Updated 2011-04-07 05:27:22)
> 
> 
> Review request for hive.
> 
> 
> Summary
> -------
> 
> Review request for HIVE-1644.12.patch
> 
> 
> This addresses bug HIVE-1644.
>     https://issues.apache.org/jira/browse/HIVE-1644
> 
> 
> Diffs
> -----
> 
>   ql/src/test/results/clientpositive/index_opt_where_simple.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/index_opt_where_partitioned.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/index_opt_where.q.out PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 73391e9 
>   ql/src/test/queries/clientpositive/index_opt_where.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_opt_where_partitioned.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/index_opt_where_simple.q PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f0aca84 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 937a7b3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 6437385 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java c02d90b 
>   ql/src/java/org/apache/hadoop/hive/ql/index/AbstractIndexHandler.java dd0186d 
>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexHandler.java 411b78f 
>   ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 1f01446

>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 50db44c 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 6162676 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/IndexWhereResolver.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/PhysicalOptimizer.java 0ae9fa2

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcCtx.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a21f589 
>   conf/hive-default.xml c42197f 
> 
> Diff: https://reviews.apache.org/r/558/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Russell
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message