drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Farkas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5755) TOP_N_SORT operator does not free memory while running
Date Thu, 21 Sep 2017 23:06:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175619#comment-16175619

Timothy Farkas commented on DRILL-5755:

[~cchang@maprtech.com] and [~ben-zvi] It looks like the real fix for the issue is taking me
longer than expected. A workaround that is not ideal, but should be sufficient, is a simple
config change here https://github.com/ilooner/drill/tree/DRILL-5755 . I verified that this
does reduce the memory burden of the operator and does not appear to impact performance significantly.
Please let me know if you are still seeing the issue or not with this change.

> TOP_N_SORT operator does not free memory while running
> ------------------------------------------------------
>                 Key: DRILL-5755
>                 URL: https://issues.apache.org/jira/browse/DRILL-5755
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Timothy Farkas
>            Priority: Blocker
>         Attachments: 2658c253-20b6-db90-362a-139aae4a327e.sys.drill
>  The TOP_N_SORT operator should keep the top N rows while processing its input, and free
the memory used to hold all rows below the top N.
> For example, the following query uses a table with 125M rows:
> {code}
> select row_count, sum(row_count), avg(double_field), max(double_rand), count(float_rand)
from dfs.`/data/tmp` group by row_count order by row_count limit 30;
> {code}
> And failed with an OOM when each of the 3 TOP_N_SORT operators was holding about 2.44
GB !! (see attached profile).  It should take far less memory to hold 30 rows !!

This message was sent by Atlassian JIRA

View raw message