hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-22074) Slow compilation due to IN to OR transformation
Date Wed, 07 Aug 2019 23:44:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22074?focusedWorklogId=290873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290873
]

ASF GitHub Bot logged work on HIVE-22074:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Aug/19 23:43
            Start Date: 07/Aug/19 23:43
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on pull request #746: HIVE-22074: Slow compilation
due to IN to OR transformation
URL: https://github.com/apache/hive/pull/746#discussion_r311806267
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
 ##########
 @@ -1220,16 +1220,26 @@ protected ExprNodeDesc getXpathOrFuncExprNodeDesc(ASTNode expr,
             }
             outputOpList.add(nullConst);
           }
+
           if (!ctx.isCBOExecuted()) {
-            ArrayList<ExprNodeDesc> orOperands = TypeCheckProcFactoryUtils.rewriteInToOR(children);
-            if (orOperands != null) {
-              if (orOperands.size() == 1) {
-                orOperands.add(new ExprNodeConstantDesc(TypeInfoFactory.booleanTypeInfo,
false));
+
+            HiveConf conf;
+            try {
+              conf = Hive.get().getConf();
 
 Review comment:
   I think it is better to pass this value from the callers in the context. You would not
need to change all callers; if value is not passed, rewriting could be skipped. I see mainly
two advantages of doing this:
   1) if transformation is never happening, we will not be retrieving the conf and this value
for every IN clause in a query (note that `isCBOExecuted` method is misleading, the value
returned is `foldExpr` boolean which is `false` sometimes even for calls coming from CBO cf.
first line in `genFilterRelNode` method in `CalcitePlanner`), and
   2) removing the static call to Hive object from within the folding logic.
   
   I see there are other calls to `Hive.get()` in the class, that information should probably
be moved to context too.
   These can all be tackled together in a follow-up, but I think since we are cleaning up
this logic, it would make sense to do it at some point.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 290873)
    Time Spent: 1h  (was: 50m)

> Slow compilation due to IN to OR transformation
> -----------------------------------------------
>
>                 Key: HIVE-22074
>                 URL: https://issues.apache.org/jira/browse/HIVE-22074
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22074.1.patch, HIVE-22074.2.patch, HIVE-22074.3.patch, HIVE-22074.4.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently Hive transform IN expressions to OR to apply various CBO rules. This incur
significant performance hit if IN consist of large number of expressions. 
> It is better to not transform IN expressions to OR in such cases because overall benefit
of various optimizations/transformations is unrealized due to the compilation overhead



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message