tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TEZ-410) Refactor Edge Connection Pattern to be more clear
Date Thu, 29 Aug 2013 17:36:52 GMT

     [ https://issues.apache.org/jira/browse/TEZ-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bikas Saha updated TEZ-410:
---------------------------

    Description: 
During discussion with users there was feedback that edge properties need to be named better
to make them more clear. There was a suggestion to look at MPI for inspiration. Based on that
feedback, the proposal is to renamed ConnectionPattern to DataMovement as that is essentially
what the property is defining. A Bipartite connection pattern can be constructed from both
broadcast and scatter-gather data movement types. There will be 3 kinds of data movements
initially. 
ONE_TO_ONE - Defines an output produced by the ith upstream task is available the the ith
downstream task.
BROADCAST - Defines an output produced by any upstream task is available to all downstream
tasks.
SCATTER_GATHER - Defines that the ith output produced by all upstream tasks is available to
the same downstream task. Upstream tasks scatter there outputs and they are gathered by designated
downstream tasks.
To be clear, output being available to the a task does not imply that the entire output is
transferred/read by it. The task can choose to read any amount of the total data.

Current users: In the EdgeProperty object
Please change EdgeConnectionPattern.BIPARTITE -> DataMovementType.SCATTER_GATHER
Please change SourceType.STABLE -> DataSourceType.PERSISTED
The getter methods have similar name changes.

  was:
During discussion with users there was feedback that edge properties need to be named better
to make them more clear. There was a suggestion to look at MPI for inspiration. Based on that
feedback, the proposal is to renamed ConnectionPattern to DataMovement as that is essentially
what the property is defining. A Bipartite connection pattern can be constructed from both
broadcast and scatter-gather data movement types. There will be 3 kinds of data movements
initially. 
ONE_TO_ONE - Defines an output produced by the ith upstream task is available the the ith
downstream task.
BROADCAST - Defines an output produced by any upstream task is available to all downstream
tasks.
SCATTER_GATHER - Defines that the ith output produced by all upstream tasks is available to
the same downstream task. Upstream tasks scatter there outputs and they are gathered by designated
downstream tasks.
To be clear, output being available to the a task does not imply that the entire output is
transferred/read by it. The task can choose to read any amount of the total data.

    
> Refactor Edge Connection Pattern to be more clear
> -------------------------------------------------
>
>                 Key: TEZ-410
>                 URL: https://issues.apache.org/jira/browse/TEZ-410
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-410.1.patch, TEZ-410.2.patch, TEZ-410.3.patch
>
>
> During discussion with users there was feedback that edge properties need to be named
better to make them more clear. There was a suggestion to look at MPI for inspiration. Based
on that feedback, the proposal is to renamed ConnectionPattern to DataMovement as that is
essentially what the property is defining. A Bipartite connection pattern can be constructed
from both broadcast and scatter-gather data movement types. There will be 3 kinds of data
movements initially. 
> ONE_TO_ONE - Defines an output produced by the ith upstream task is available the the
ith downstream task.
> BROADCAST - Defines an output produced by any upstream task is available to all downstream
tasks.
> SCATTER_GATHER - Defines that the ith output produced by all upstream tasks is available
to the same downstream task. Upstream tasks scatter there outputs and they are gathered by
designated downstream tasks.
> To be clear, output being available to the a task does not imply that the entire output
is transferred/read by it. The task can choose to read any amount of the total data.
> Current users: In the EdgeProperty object
> Please change EdgeConnectionPattern.BIPARTITE -> DataMovementType.SCATTER_GATHER
> Please change SourceType.STABLE -> DataSourceType.PERSISTED
> The getter methods have similar name changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message