hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang" <ctang...@gmail.com>
Subject Re: Review Request 42081: HIVE-12788: Setting hive.optimize.union.remove to TRUE will break UNION ALL with aggregate functions
Date Fri, 08 Jan 2016 22:28:27 GMT


> On Jan. 8, 2016, 9:43 p.m., pengcheng xiong wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java, line 219
> > <https://reviews.apache.org/r/42081/diff/2/?file=1188760#file1188760line219>
> >
> >     Could you please be more specific on the reason why "if TS[0] branch for src1
is not optimized, then there is no need to continue processing TS[6] branch? Thanks.

The AST tree of query (select max(value) from src1 union all select max(value) from src2)
passed to StatsOptimizator after UnionRemove optimization is:
      TS[0]->SEL[1]->GBY[2]-RS[3]->GBY[4]->FS[17]   --- for subquery src1
      TS[6]->SEL[7]->GBY[8]-RS[9]->GBY[10]->FS[18]  --- for subquery src2
It has two top Operators, TS[0] for table src1 and TS[6] for table src2. If the TS[0] branch
(for subquery src1) is not optimized but TS[6] branch (for subquery src2) is, in existing
code, TS[6] branch result will be set to FetchTask in ParseContext and the entire query is
not further compiled into MRTasks (in SemanticAnalyzer.analyzeInternal step 9). So the union
query will return result with only the row from TS[6] (the subquery src2). It is obviously
not right. So for union query, if any one of its subqueries could not be Stats Optimizated,
the whole query should not be optimized and fails back to regular plan. I wonder if it is
a littler clear. Thanks


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42081/#review113549
-----------------------------------------------------------


On Jan. 8, 2016, 8:16 p.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42081/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2016, 8:16 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, pengcheng xiong, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12788
>     https://issues.apache.org/jira/browse/HIVE-12788
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> adds StatsOptimizator support to union with aggregate function. Otherwise, it always
returns one row.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/StatsOptimizer.java 03c1c3f 
>   ql/src/test/queries/clientpositive/union_remove_26.q PRE-CREATION 
>   ql/src/test/results/clientpositive/union_remove_26.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42081/diff/
> 
> 
> Testing
> -------
> 
> 1. Manual tests for some partitcular cases
> 2. submitted to precommit-tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message