hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhiraj Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11652) Avoid expensive call to removeAll in DefaultGraphWalker
Date Wed, 21 Dec 2016 14:32:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767191#comment-15767191
] 

Dhiraj Kumar commented on HIVE-11652:
-------------------------------------

[~jcamachorodriguez] As suggested, I have create another bug at [https://issues.apache.org/jira/browse/HIVE-15486]


I have attached a larger version of the same query that I had put here in the trail. It has
50K elements inside IN clause. Although it seems a convoluted query, we do have similar query
running on our production system. 

We did not face problem until the upgrade to 2.1 from an older version. Although, this codepath
is not the only problem for this query. But it accounts for 50% of the time consumed at hive
processing. 

I did not get your suggestion of keeping last added child in the stack. Since the node peeked
from stack would keep changing,  the list of child would change too for that node. Keeping
the last added won't be able to help since you have lost list of children once you peeked
another node from stack. I believe you need to keep all the children of node as well. 

In fact that finding position itself is not a problem, but ASTNode.getChildren() invocation
is problem.  

I have to pick something in ASTNode.java as well,  why to add single child at at time to ret_vec.
Why not return all the children() in one shot? 

{code}
public ArrayList<Node> getChildren() {
    if (super.getChildCount() == 0) {
      return null;
    }

    ArrayList<Node> ret_vec = new ArrayList<Node>();
    for (int i = 0; i < super.getChildCount(); ++i) {
      ret_vec.add((Node) super.getChild(i));
    }

    return ret_vec;
  }
{code}

I will add some profiler output to bug as well. 

> Avoid expensive call to removeAll in DefaultGraphWalker
> -------------------------------------------------------
>
>                 Key: HIVE-11652
>                 URL: https://issues.apache.org/jira/browse/HIVE-11652
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Physical Optimizer
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>             Fix For: 1.3.0, 2.0.0
>
>         Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch, HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 140) will
take very long as it will have to go through the list looking for each of the nodes. We try
to get rid of this call by rewriting the logic in the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message