hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12189) The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
Date Sat, 17 Oct 2015 17:55:05 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962014#comment-14962014
] 

Hive QA commented on HIVE-12189:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12767094/HIVE-12189.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9702 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5693/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5693/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5693/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12767094 - PreCommit-HIVE-TRUNK-Build

> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow very large
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-12189
>                 URL: https://issues.apache.org/jira/browse/HIVE-12189
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>    Affects Versions: 1.1.0, 2.0.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf 
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid) 
> join tt3 a2 on        (a2.col2 = a1.col2 and a2.col3 = nf.col3 and a2.hdp_databaseid
= nf.hdp_databaseid) 
> join tt4 a3 on              (a3.col4 = a2.col4 and a3.col3 = a2.col3) 
> join tt5 a4 on     (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 = a2.col3 and
a4.hdp_databaseid = nf.hdp_databaseid) 
> join tt6 a5 on              (a5.col3 = a2.col3 and a5.col2 = a2.col2 and a5.hdp_databaseid
= nf.hdp_databaseid) 
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid = nf.hdp_databaseid)

> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the tables are
empty table. If the tables are not empty, it is reported that the compile so slow that it
looks like hive is hanging. 
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is still a
lot of time. One of the problem slows ppd down is that list in pushdownPreds can grow very
large which makes extractPushdownPreds bad performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
>     Operator<? extends OperatorDesc> op, List<ExprNodeDesc> preds)
> {noformat}
> During run the query above, in the following break point preds  has size of 12051, and
most entry of the list is: GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid],
Const int 102), GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), GenericUDFOPEqual(Column[hdp_databaseid],
Const int 102), ....
> Following code in extractPushdownPreds will clone all the nodes in preds and do the walk.
Hive 2.0 is faster because HIVE-11652(and other jiras) makes startWalking much faster, but
we still clone thousands of nodes with same expression. Should we store so many same predicates
in the list or just one is good enough?  
> {noformat}
>     List<Node> startNodes = new ArrayList<Node>();
>     List<ExprNodeDesc> clonedPreds = new ArrayList<ExprNodeDesc>();
>     for (ExprNodeDesc node : preds) {
>       ExprNodeDesc clone = node.clone();
>       clonedPreds.add(clone);
>       exprContext.getNewToOldExprMap().put(clone, node);
>     }
>     startNodes.addAll(clonedPreds);
>     egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method 
> public void addFinalCandidate(String alias, ExprNodeDesc expr) 
> and
> public void addPushDowns(String alias, List<ExprNodeDesc> pushDowns) 
> to only add expr which is not in the PushDown list for an alias?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message