hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu (JIRA)" <>
Subject [jira] [Commented] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount
Date Thu, 25 May 2017 18:10:04 GMT


Remus Rusanu commented on HIVE-16757:

>From what I can see, this seems to be a problem in Calcite code as well. I will start
a discussion on calcite-dev.

> Use memoization in HiveRelMdRowCount.getRowCount
> ------------------------------------------------
>                 Key: HIVE-16757
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>         Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. since it
does not memoize its result and the call is recursive, it results in an explosion of calls.
for example a query with 49 joins, during join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount
gets called 6442 as a top level call, but the recursivity exploded this to 501729 calls. Memoization
of the rezult would stop the recursion early. In my testing this reduced the join reordering
time for said query from 11s to <1s..

This message was sent by Atlassian JIRA

View raw message