drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-4446) Improve current fragment parallelization module
Date Wed, 13 Apr 2016 18:29:25 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Venki Korukanti resolved DRILL-4446.
    Resolution: Fixed

> Improve current fragment parallelization module
> -----------------------------------------------
>                 Key: DRILL-4446
>                 URL: https://issues.apache.org/jira/browse/DRILL-4446
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: 1.5.0
>            Reporter: Venki Korukanti
>            Assignee: Venki Korukanti
>             Fix For: 1.7.0
> Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle correctly the
case where an operator has mandatory scheduling requirement for a set of DrillbitEndpoints
and affinity for each DrillbitEndpoint (i.e how much portion of the total tasks to be scheduled
on each DrillbitEndpoint). It assumes that scheduling requirements are soft (except one case
where Mux and DeMux case where mandatory parallelization requirement of 1 unit). 
> An example is:
> Cluster has 3 nodes running Drillbits and storage service on each. Data for a table is
only present at storage services in two nodes. So a GroupScan needs to be scheduled on these
two nodes in order to read the data. Storage service doesn't support (or costly) reading data
from remote node.
> Inserting the mandatory scheduling requirements within existing SimpleParallelizer is
not sufficient as you may end up with a plan that has a fragment with two GroupScans each
having its own hard parallelization requirements.
> Proposal is:
> Add a property to each operator which tells what parallelization implementation to use.
Most operators don't have any particular strategy (such as Project or Filter), they depend
on incoming operator. Current existing operators which have requirements (all existing GroupScans)
default to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new mandatory
assignment parallelizer. It is possible that PhysicalPlan generated can have a fragment with
operators having different parallelization strategies. In that case an exchange is inserted
in between operators where a change in parallelization strategy is required.
> Will send a detailed design doc.

This message was sent by Atlassian JIRA

View raw message