drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5472) Parquet reader generating low-density batches causing Sort operator to spill un-necessarily
Date Thu, 04 May 2017 17:12:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997100#comment-15997100
] 

Paul Rogers commented on DRILL-5472:
------------------------------------

This is a known issue with Parquet, but one that is not currently a high priority.

The thought here is that this issue will be resolved as a side-effect of the fix for DRILL-5211.
For that bug, we must limit vector sizes to 16 MB. At present, the Parquet reader tries, but
fails, to limit vector sizes. That failure causes random vector sizes and low density. Fixing
the Parquet vector limit to avoid fragmentation will also, perhaps, reduced the low-density
issue without the issue itself having to be a high priority.

> Parquet reader generating low-density batches causing Sort operator to spill un-necessarily
> -------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5472
>                 URL: https://issues.apache.org/jira/browse/DRILL-5472
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators, Storage - Parquet
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: drill5472.log, drill5472.parquet, drill5472.sys.drill
>
>
> git.commit.id.abbrev=1e0a14c
> The parquet file used in the below query is ~20MB. The uncompressed size id ~1.2 GB.
Now the below query has a sort which is given ~6GB memory for a single fragment and yet it
spills.
> {code}
> select * from (select * from dfs.`/drill/testdata/resource-manager/all_types_large` s
order by s.missing12.x) d where d.missing3 is false;
> {code}
> The profile indicates that the above query has spilled twice. Attached the profile and
the logs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message