hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner" <ghagleit...@hortonworks.com>
Subject Re: Review Request 12827: HIVE-4611 - SMB joins fail based on bigtable selection policy.
Date Fri, 26 Jul 2013 01:01:59 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12827/#review23905
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
<https://reviews.apache.org/r/12827/#comment47730>

    Can you add which table(s) those are ({0})?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeBasedBigTableSelectorForAutoSMJ.java
<https://reviews.apache.org/r/12827/#comment47732>

    you need a check here too, don't you? Otherwise you might return 0 even though that's
not a valid choice.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeBasedBigTableSelectorForAutoSMJ.java
<https://reviews.apache.org/r/12827/#comment47738>

    what's this for? is this an error condition? not sure when topOp would be null, but you
might jump over it with the new code, is that what you really want?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/BigTableSelectorForAutoSMJ.java
<https://reviews.apache.org/r/12827/#comment47742>

    This should be a List<>, right?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/LeftmostBigTableSelectorForAutoSMJ.java
<https://reviews.apache.org/r/12827/#comment47743>

    List?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
<https://reviews.apache.org/r/12827/#comment47748>

    I think only when you allocate you really need to specify ArrayList. There's multiple
instances of that.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
<https://reviews.apache.org/r/12827/#comment47750>

    Also, why can't seenPositions and bigTableCandidates be Sets? Seems more appropriate.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
<https://reviews.apache.org/r/12827/#comment47751>

    This is confusing. Can you add comment explaining under what circumstances you return
false v throw exceptions and why?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TableSizeBasedBigTableSelectorForAutoSMJ.java
<https://reviews.apache.org/r/12827/#comment47753>

    Same as above.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/TableSizeBasedBigTableSelectorForAutoSMJ.java
<https://reviews.apache.org/r/12827/#comment47752>

    This one has the topOp and bigTableCandidate list reversed.


- Gunther Hagleitner


On July 22, 2013, 10:55 p.m., Vikram Dixit Kumaraswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12827/
> -----------------------------------------------------------
> 
> (Updated July 22, 2013, 10:55 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Brock Noland, and Gunther Hagleitner.
> 
> 
> Bugs: HIVE-4611
>     https://issues.apache.org/jira/browse/HIVE-4611
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> SMB joins fail based on bigtable selection policy. The default setting for hive.auto.convert.sortmerge.join.bigtable.selection.policy
will choose the big table as the one with largest average partition size. However, this can
result in a query failing because this policy conflicts with the big table candidates chosen
for outer joins. This policy should just be a tie breaker and not have the ultimate say in
the choice of tables.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8330f65 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java cc9de54 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeBasedBigTableSelectorForAutoSMJ.java
5320143 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/BigTableSelectorForAutoSMJ.java db5ff0f

>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/LeftmostBigTableSelectorForAutoSMJ.java
db3c9e7 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java d83fb66 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/TableSizeBasedBigTableSelectorForAutoSMJ.java
b882f87 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
3071713 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
f98878c 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java
af56857 
>   ql/src/test/queries/clientnegative/auto_sortmerge_join_1.q c858254 
>   ql/src/test/queries/clientpositive/auto_sortmerge_join_15.q PRE-CREATION 
>   ql/src/test/results/clientnegative/auto_sortmerge_join_1.q.out 0eddb69 
>   ql/src/test/results/clientnegative/join2.q.out b53b3a1 
>   ql/src/test/results/clientnegative/smb_bucketmapjoin.q.out 7a5b8c1 
>   ql/src/test/results/clientpositive/auto_sortmerge_join_15.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/12827/diff/
> 
> 
> Testing
> -------
> 
> All tests pass on hadoop 1.
> 
> 
> Thanks,
> 
> Vikram Dixit Kumaraswamy
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message