hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prajakta Kalmegh" <pkalm...@gmail.com>
Subject Re: Review Request: HIVE-1694: Accelerate GROUP BY execution using indexes
Date Mon, 01 Aug 2011 16:42:36 GMT


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 61
> > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line61>
> >
> >     Please run ant checkstyle and fix all the formatting discrepancies it reports
for your new files.
> >

Done! The code is still having checkstyle formatting errors only for places where we have
used LinkedHashMap, HashMap and ArrayList. The error states "Declaring variables, return values
or parameters of type 'HashMap' is not allowed".


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java, line 184
> > <https://reviews.apache.org/r/1194/diff/1/?file=27052#file27052line184>
> >
> >     Don't you need to reuse the compact implementation here so that the index can
be used for WHERE (not just GROUP BY)?
> >

The AggregateIndexHandler now extends from CompactIndexHandler instead of TableBasedIndexHandler.
We override only analyzeIndexDefinition(...) and getIndexBuilderMapRedTask(...) methods from
CompactIndexHandler.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 700
> > <https://reviews.apache.org/r/1194/diff/1/?file=27054#file27054line700>
> >
> >     This method is redundant now.

Removed. Sorry to have missed that.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java, line
252
> > <https://reviews.apache.org/r/1194/diff/1/?file=27056#file27056line252>
> >
> >     I can't think of a case where it would be worse.

Ok.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java,
line 164
> > <https://reviews.apache.org/r/1194/diff/1/?file=27057#file27057line164>
> >
> >     Actually group-by is now preserved in all cases.

Forgot to change a few comments after fixing the bug. Done!


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java,
line 66
> > <https://reviews.apache.org/r/1194/diff/1/?file=27058#file27058line66>
> >
> >     Please use HTML bullet syntax for Javadoc (otherwise it all gets run together
into one line when rendered).
> >

Done!


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java,
line 93
> > <https://reviews.apache.org/r/1194/diff/1/?file=27060#file27060line93>
> >
> >     Shouldn't this be BIGINT?
> >     
> >     Also, I think you're supposed to use a TypeInfoFactory for this purpose.

Yes. Changed it to bigint. Also changed it in AggregateIndexHandler where I had declared the
type to be "int".


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java, line 603
> > <https://reviews.apache.org/r/1194/diff/1/?file=27062#file27062line603>
> >
> >     Not sure why this new constructor is needed...after using it, all you do is
get the table out of it.

The only other constructor option for tableSpec needs the ASTNode as one of its parameters.
Since we need to construct a new tableSpec using only the index table name, and we do not
have a ASTNode for this, I need this constructor. If you have any other way in mind, please
let me know. That would be helpful.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 27
> > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line27>
> >
> >     This should *not* be using the index, since the index is built on count(l_shipdate),
and l_shipdate may contain nulls, whereas the query is referencing count(1), which is insensitive
to nulls.

Yes true. :(
I have now changed the count(1) queries with count(l_shipdate) in ql_rewrite_gbtoidx.q file.
Also, verified that the count(1) queries are not using the index.


> On 2011-07-28 21:40:30, John Sichi wrote:
> > ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q, line 61
> > <https://reviews.apache.org/r/1194/diff/1/?file=27063#file27063line61>
> >
> >     Need additional tests to verify all the cases where the optimization should
*not* be used:
> >     
> >     * when configuration disables it
> >     * when index partitions do not cover table partitions (I still don't see the
code for this case)
> >     * ... all the other conditions checked for in the code ...
> >

Added new queries to verify that optimization is not used in case of:
* when configuration disables it
* ... all the other conditions checked for in the code ...

About "* when index partitions do not cover table partitions ", still pending (also the code
for it). I will upload the new patch once this is done.


- Prajakta


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1194/#review1212
-----------------------------------------------------------


On 2011-07-26 14:44:01, Prajakta Kalmegh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1194/
> -----------------------------------------------------------
> 
> (Updated 2011-07-26 14:44:01)
> 
> 
> Review request for hive and John Sichi.
> 
> 
> Summary
> -------
> 
> This patch has defined a new AggregateIndexHandler which is used to optimize the query
plan for groupby queries. 
> 
> 
> This addresses bug HIVE-1694.
>     https://issues.apache.org/jira/browse/HIVE-1694
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b46976f 
>   ql/src/java/org/apache/hadoop/hive/ql/index/AggregateIndexHandler.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 591c9ff 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2ca63b3 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 590d69a 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyCtx.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteCanApplyProcFactory.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
>   ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q PRE-CREATION 
>   ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/1194/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Prajakta
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message