drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available
Date Fri, 04 Nov 2016 22:04:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637850#comment-15637850
] 

ASF GitHub Bot commented on DRILL-4706:
---------------------------------------

Github user ppadma commented on the issue:

    https://github.com/apache/drill/pull/639
  
    Parallelization logic is affected for following reasons:
    Depending upon how many rowGroups to scan on a node (based on locality information) i.e.
how much work the node has to do, we want to adjust the number of fragments on the node (constrained
to usual global and per node limits). 
    We do not want to schedule fragment(s) on a node which do not have data. 
    Because we want pure locality, we may have fewer fragments doing more work.



> Fragment planning causes Drillbits to read remote chunks when local copies are available
> ----------------------------------------------------------------------------------------
>
>                 Key: DRILL-4706
>                 URL: https://issues.apache.org/jira/browse/DRILL-4706
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.6.0
>         Environment: CentOS, RHEL
>            Reporter: Kunal Khatua
>            Assignee: Sorabh Hamirwasia
>              Labels: performance, planning
>
> When a table (datasize=70GB) of 160 parquet files (each having a single rowgroup and
fitting within one chunk) is available on a 10-node setup with replication=3 ; a pure data
scan query causes about 2% of the data to be read remotely. 
> Even with the creation of metadata cache, the planner is selecting a sub-optimal plan
of executing the SCAN fragments such that some of the data is served from a remote server.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message