hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403
Date Mon, 04 Feb 2013 05:56:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570002#comment-13570002
] 

Namit Jain commented on HIVE-3980:
----------------------------------


common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
503 ↗
(On Diff #26877)	
Better name: hive.auto.convert.sortmerge.join?
504 ↗
(On Diff #26877)	
hive.auto.convert.sortmerge.join.bigtable.selection.policy ?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java
108 ↗
(On Diff #26877)	
Why is this needed ? If orders[] = index for all i, this is not needed and if orders[] !=
index for some i, than we are partially updating orders[] which sounds dangerous.
114 ↗
(On Diff #26877)	
I think it makes more sense to do this check before looping over and comparing orders, since
this is a stricter check, in a sense that if all join keys are not bucketed columns there
is no point in checking order.
98 ↗
(On Diff #26877)	
Better argument name: joinKeys ?
99 ↗
(On Diff #26877)	
Better argument name: joinKeyOrders?
80 ↗
(On Diff #26877)	
Better function name: getBucketFilePathOfPartition() ?
Also can you refactor this to return List<Path> instead, we had troubles because of
String Vs Path interchangeable usage in Hive code base in past.
97 ↗
(On Diff #26877)	
Comment for function ?
"This function checks whether all bucketing columns are also in join keys and are in same
order."
117 ↗
(On Diff #26877)	
Better function name: checkNumOfBucketsAgainstBigTable()?
118 ↗
(On Diff #26877)	
Better param name: aliasToNumOfBuckets, NumOfBucketsInPartition ?
182 ↗
(On Diff #26877)	
This function does bunch of checks to determine if join can be converted to bucketmapjoin.
It will be good idea to list all the checks it performs in comments for this method.
190 ↗
(On Diff #26877)	
better name: tblAliasToNumOfBucketsInEachPartition ?
Actually, it seems that you can always derive this info from the next field, it is just the
size of the innermost list in next map. If so, please consider getting rid of this map and
just use next field wherever this one is required. 
Having too many datastructures makes code harder to follow as well as results in tighter function
coupling.
192 ↗
(On Diff #26877)	
better name: tblAliasToListOfBucketedFilePathInEachPartition?
200–203 ↗
(On Diff #26877)	
Similar comment as the one above applies here as well.
210 ↗
(On Diff #26877)	
In what case, topOp for the alias could be null. Is it a subquery case? Either case, please
add a comment when this topOp could be null and why in that case we cannot join to BMJ.
219 ↗
(On Diff #26877)	
Comment:
We cannot get to root TS operator, likely because there is a join or group-by between topOp
and root TS operator. We don't handle that case, so we simply return.
226 ↗
(On Diff #26877)	
Should this be topOpEntry.getValue().equals(tso) instead ?
361 ↗
(On Diff #26877)	
This function does bunch of checks to determine if mapjoin can be converted to bucketmapjoin.
It will be good idea to list all the checks it performs in comments for this method.
366 ↗
(On Diff #26877)	
Better name ?
205 ↗
(On Diff #26877)	
Better name: joinKeyOrder?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java
87 ↗
(On Diff #26877)	
Comment:
First check if this map-join operator is already a BMJ or not. If not give up we are only
trying to convert a BMJ to sort-BMJ.
121 ↗
(On Diff #26877)	
If table is not sorted, we cannot do sortBMJ anyways. Secondly we cannot do sortBMJ for outer
joins either. Shouldn't this check be made much earlier?
232 ↗
(On Diff #26877)	
Early parts of this method performs some of the same checks of checkConvertBucketMapJoin()
of AbstractBucketJoinProc.java. Perhaps, we can refactor to avoid code duplication. Further,
name isTableSorted is not correct, since it does much more than checking if table is sorted.
Better name: isEligibleForSBMJ() ?
462 ↗
(On Diff #26877)	
This should be a checked exception (perhaps of type SemanticException). Certainly, not RuntimeException.
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AvgPartitionSizeSortMergeJoinBigTableMatcher.java
41 ↗
(On Diff #26877)	
Better name: AvgPartitionSizeBasedTblSelectorForSMJ ?
41 ↗
(On Diff #26877)	
Is there a test for this policy ?
ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketJoinOptProcCtx.java
39 ↗
(On Diff #26877)	
Better name: rejectedJoinOps.
42 ↗
(On Diff #26877)	
Better name: convertedJoinOps. Also, update the comment.
60 ↗
(On Diff #26877)	
Name: getRejectedJoinOps ?
Also can similarly name other getters and setters of this class.
                
> Cleanup after HIVE-3403
> -----------------------
>
>                 Key: HIVE-3980
>                 URL: https://issues.apache.org/jira/browse/HIVE-3980
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> There have been a lot of comments on HIVE-3403, which involve changing 
> variable names/function names/adding more comments/general cleanup etc.
> Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to
> address the comments there, since refreshing becomes impossible. This jira
> is to track those cleanups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message