tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hitesh Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-410) Refactor Edge Connection Pattern to be more clear
Date Fri, 30 Aug 2013 00:04:52 GMT

    [ https://issues.apache.org/jira/browse/TEZ-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754234#comment-13754234
] 

Hitesh Shah commented on TEZ-410:
---------------------------------

Comments:

{code}
+      default : throw new RuntimeException("unknown 'SchedulingType'");
{code}
  - might help to add the actual value to what enum was not handled
  - may be required in other places in the same class ( DagTypeConverters.java )

{code}
+    /**
+     * Data produced by the source task is persisted and available even when the
+     * task is not running. The data may be unavailable and may cause the source
+     * task to be re-executed.
+     */
+    PERSISTED,
{code}
   - "... data may be*come* unavailable ... "

   - "source task is stored in reliably" --> remove the "in" ?

Looks good apart from the above minor nits. Good to commit after addressing above.
                
> Refactor Edge Connection Pattern to be more clear
> -------------------------------------------------
>
>                 Key: TEZ-410
>                 URL: https://issues.apache.org/jira/browse/TEZ-410
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-410.1.patch, TEZ-410.2.patch, TEZ-410.3.patch, TEZ-410.4.patch
>
>
> During discussion with users there was feedback that edge properties need to be named
better to make them more clear. There was a suggestion to look at MPI for inspiration. Based
on that feedback, the proposal is to renamed ConnectionPattern to DataMovement as that is
essentially what the property is defining. A Bipartite connection pattern can be constructed
from both broadcast and scatter-gather data movement types. There will be 3 kinds of data
movements initially. 
> ONE_TO_ONE - Defines an output produced by the ith upstream task is available the the
ith downstream task.
> BROADCAST - Defines an output produced by any upstream task is available to all downstream
tasks.
> SCATTER_GATHER - Defines that the ith output produced by all upstream tasks is available
to the same downstream task. Upstream tasks scatter there outputs and they are gathered by
designated downstream tasks.
> To be clear, output being available to the a task does not imply that the entire output
is transferred/read by it. The task can choose to read any amount of the total data.
> Current users: In the EdgeProperty object
> Please change EdgeConnectionPattern.BIPARTITE -> DataMovementType.SCATTER_GATHER
> Please change SourceType.STABLE -> DataSourceType.PERSISTED
> Please add SchedulingType.SEQUENTIAL to EdgeProperty objects.
> The getter methods have similar name changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message