drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Farkas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5755) TOP_N_SORT operator does not free memory while running
Date Fri, 08 Sep 2017 20:59:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159329#comment-16159329

Timothy Farkas commented on DRILL-5755:

Other ideas discussed with [~ben-zvi] were that the TopN operator could be limited to being
used for small N. If N exceeds a threshold then we could leverage spilled sort instead.

> TOP_N_SORT operator does not free memory while running
> ------------------------------------------------------
>                 Key: DRILL-5755
>                 URL: https://issues.apache.org/jira/browse/DRILL-5755
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Timothy Farkas
>         Attachments: 2658c253-20b6-db90-362a-139aae4a327e.sys.drill
>  The TOP_N_SORT operator should keep the top N rows while processing its input, and free
the memory used to hold all rows below the top N.
> For example, the following query uses a table with 125M rows:
> {code}
> select row_count, sum(row_count), avg(double_field), max(double_rand), count(float_rand)
from dfs.`/data/tmp` group by row_count order by row_count limit 30;
> {code}
> And failed with an OOM when each of the 3 TOP_N_SORT operators was holding about 2.44
GB !! (see attached profile).  It should take far less memory to hold 30 rows !!

This message was sent by Atlassian JIRA

View raw message