drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (DRILL-5453) Managed External Sort : Sorting on a lot of columns is taking unreasonably long time
Date Thu, 29 Jun 2017 02:37:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Rogers updated DRILL-5453:
-------------------------------
    Comment: was deleted

(was: Will be resolved by controlling batch size, which is being done by the linked project.)

> Managed External Sort : Sorting on a lot of columns is taking unreasonably long time
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-5453
>                 URL: https://issues.apache.org/jira/browse/DRILL-5453
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: drill5453.sys.drill
>
>
> The below query ran for ~16hrs before I cancelled it.
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 482344960;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select count(*) from (select * from dfs.`/drill/testdata/resource-manager/3500cols.tbl`
order by columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50],
columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[2222],columns[30],columns[2420],columns[1520],
columns[1410], columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350],
columns[3333],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530],
columns[3210] ) d where d.col433 = 'sjka skjf';
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> {code}
> The data set and the logs are too large to attach to a jira. But below is a description
of the data
> {code}
> No of records : 1,000,000
> No of columns : 3500
> Length of each column : < 50
> {code}
> The profile is attached and I will give my analysis on why I think its an un-reasonable
amount of time soon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message