Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E661718283 for ; Wed, 5 Aug 2015 09:52:04 +0000 (UTC) Received: (qmail 39040 invoked by uid 500); 5 Aug 2015 09:52:04 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 38986 invoked by uid 500); 5 Aug 2015 09:52:04 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 38976 invoked by uid 99); 5 Aug 2015 09:52:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2015 09:52:04 +0000 Date: Wed, 5 Aug 2015 09:52:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-2398) Decouple StreamGraph Building from the API MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655099#comment-14655099 ] ASF GitHub Bot commented on FLINK-2398: --------------------------------------- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/988#issuecomment-127940968 I am not sure that I understand this correctly: If a non parallel source is used does the user need to call `rebalance` to use all parallel instances of the downstream operator? > Decouple StreamGraph Building from the API > ------------------------------------------ > > Key: FLINK-2398 > URL: https://issues.apache.org/jira/browse/FLINK-2398 > Project: Flink > Issue Type: Improvement > Components: Streaming > Reporter: Aljoscha Krettek > Assignee: Aljoscha Krettek > > Currently, the building of the StreamGraph is very intertwined with the API methods. DataStream knows about the StreamGraph and keeps track of splitting, selected names, unions and so on. This leads to the problem that is is very hard to understand how the StreamGraph is built because the code that does it is all over the place. This also makes it hard to extend/change parts of the Streaming system. > I propose to introduce "Transformations". A transformation hold information about one operation: The input streams, types, names, operator and so on. An API method creates a transformation instead of fiddling with the StreamGraph directly. A new component, the StreamGraphGenerator creates a StreamGraph from the tree of transformations that result from program specification using the API methods. This would relieve DataStream from knowing about the StreamGraph and makes unions, splitting, selection visible transformations instead of being scattered across the different API classes as fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)