drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5228) Several operators in the attached query profile take more time than expected
Date Fri, 27 Jan 2017 04:30:24 GMT
Rahul Challapalli created DRILL-5228:

             Summary: Several operators in the attached query profile take more time than
                 Key: DRILL-5228
                 URL: https://issues.apache.org/jira/browse/DRILL-5228
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 1.10.0
            Reporter: Rahul Challapalli

Environment :

Data Set : 
Size : ~18 GB
No Of Columns : 1
Column Width : 256 bytes

Query ( took ~127 minutes to complete)
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.memory.max_query_memory_per_node` = 14106127360;
select * from (select * from dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d
where d.columns[0] = 'ljdfhwuehnoiueyf';

*Selection Vector Remover*
Time Spent based on profile : 7m58s
Problem : Since the external sort spilled to the disk in this case, the selection vector remover
should have been an no-op. There is no clear justification for the time spent

*Text Sub Scan*
Time spent based on profile : 13m25s
Problem : I captured the profile screenshot (before-spill.png) once the memory allocation
for the sort reached its limit. Based on this the scan took 2m13s for reading the first 12.48GB
of data before sorting/spilling began. For the remaining ~5.5 GB it took  ~11 minutes.

Timings for the 4 projects based on profile. While I do not have a good reason to suspect,
these numbers seemed high.
Project 1 : 4m54s
Project 2 : 3m07s
Project 3 : 4m10s
Project 4 : 0.003s

The time spent in the external sort based on the profile is wrong. DRILL-5227 is reported
for this.

This message was sent by Atlassian JIRA

View raw message