Chen Feng created PHOENIX-5491:
----------------------------------
Summary: Improve performance of InListExpression.hashCode
Key: PHOENIX-5491
URL: https://issues.apache.org/jira/browse/PHOENIX-5491
Project: Phoenix
Issue Type: Improvement
Reporter: Chen Feng
Assignee: Chen Feng
In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code: "extractNodes.addAll(nodesToExtract)"
When executing sqls like "select * from ... where A in (a1, a2, ..., a_n) and B = X", saying
A in N (N > 100,000) elements, previous code execution will slow (> 90s in our environment).
This is because in such case, extractNodes is a HashSet, nodesToExtract is a List with N
InListExpression (the N InListExpressions are the same instance), each InListExpression.values
has N elements as well.
HashSet.addAll(list<InListExpression>) will call N times of InListExpression.hashCode().
Each time, InListExpression.hashCode() will calculate hashCode for every value. Therefore,
the time complexity will be N^2.
A simple way to solve it is to remember of the hashCode of InListExpression and returns it
directly if calculated once.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
|