drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5472) Parquet reader generating low-density batches causing Sort operator to spill un-necessarily
Date Thu, 04 May 2017 16:50:04 GMT
Rahul Challapalli created DRILL-5472:
----------------------------------------

             Summary: Parquet reader generating low-density batches causing Sort operator
to spill un-necessarily
                 Key: DRILL-5472
                 URL: https://issues.apache.org/jira/browse/DRILL-5472
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators, Storage - Parquet
            Reporter: Rahul Challapalli
            Assignee: Paul Rogers


git.commit.id.abbrev=1e0a14c

The parquet file used in the below query is ~20MB. The uncompressed size id ~1.2 GB. Now the
below query has a sort which is given ~6GB memory for a single fragment and yet it spills.
{code}
select * from (select * from dfs.`/drill/testdata/resource-manager/all_types_large` s order
by s.missing12.x) d where d.missing3 is false;
{code}

The profile indicates that the above query has spilled twice. Attached the profile and the
logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message