Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D1CA417D11 for ; Tue, 31 Mar 2015 16:17:03 +0000 (UTC) Received: (qmail 31431 invoked by uid 500); 31 Mar 2015 16:17:03 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 31369 invoked by uid 500); 31 Mar 2015 16:17:03 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 31357 invoked by uid 99); 31 Mar 2015 16:17:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 16:17:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ewenstephan@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-ig0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 16:16:57 +0000 Received: by igbud6 with SMTP id ud6so23166131igb.1 for ; Tue, 31 Mar 2015 09:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=JmH5gDcbSEvp8he/ab1SP74YttQs0fGdK87LwdqD8UU=; b=Zj53tiECVbeKPNyRKKg+VG0crj23pP5N08wird+OqveDoEke/OqthpB5Fcq1MQsVtU u8mQHXHS59Sr1IbOZP7gb9QhtHTE+YKzazIsbBSwow/hEX4CtRDe+B4YFyHhivZCOKmf 3Cuclwd+ynJwezQ/H3eBOQEc+Mgea/4gIg6LCLRf+MQSM6AcZqWMXHWPAyR9e5+17F81 mIWLbPLxkzbe7bROW8iGGBaHeQ3+6Y6ZFUfnbSpiWRgaWmehz2gnVMMTzN4iYusFni2F QOriUBDvlXj4JtcNSD14LAP67YUj/a4p5g8NKc/XBz6pqM5OhaRQ6beOpHQ/CwawlZkU RWJQ== MIME-Version: 1.0 X-Received: by 10.50.67.100 with SMTP id m4mr5450466igt.32.1427818597437; Tue, 31 Mar 2015 09:16:37 -0700 (PDT) Sender: ewenstephan@gmail.com Received: by 10.64.76.130 with HTTP; Tue, 31 Mar 2015 09:16:37 -0700 (PDT) In-Reply-To: References: Date: Tue, 31 Mar 2015 18:16:37 +0200 X-Google-Sender-Auth: hccBK0M3lLYDGdQ_gslsiWv4SPw Message-ID: Subject: Re: [DISCUSS] Inconsistent naming of intermediate results From: Stephan Ewen To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=047d7bd75b344e0c8a051297ec59 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd75b344e0c8a051297ec59 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I like getting the consistency in there. I was never thinking of the intermediate data sets to be strictly produced by a vertex, so I am unsure whether we should use that exact naming scheme, or one that disconnects the results from the term "VertexResult". On Tue, Mar 31, 2015 at 5:27 PM, Kostas Tzoumas wrote= : > I like the fact that the naming scheme follows some logic. > > I also like that we have two easy to understand concepts: > - Operator (be that in any of the above representations) > - Result (of executing an operator) > > +1 > > On Tue, Mar 31, 2015 at 4:50 PM, Ufuk Celebi wrote: > > > On a high level we call intermediate data produced by programs > > "intermediate results". For example in a WordCount map-reduce program t= he > > map function produces an intermediate result, which consists of (word, = 1) > > pairs and the reduce function consumes this intermediate result. Kostas > has > > recently added documentation explaining the core concepts [1]. > > > > The naming of classes related to intermediate results is inconsistent > (and > > probably confusing). > > > > - In JobGraphs (internal low-level API to define programs) they are > called > > IntermediateDataSet and identified by IntermediateDataSetIDs. > > > > - In ExecutionGraphs (JobManager structure used for state > > tracking/scheduling) they are called IntermediateResult at the > > ExecutionJobVertex (identified by IntermediateDataSetID) and > > IntermediateResultPartition at the ExecutionVertex (identified by > > IntermediateResultPartitionID). > > > > - At runtime (TaskManager) they are called ResultPartition and identifi= ed > > by ResultPartitionID (composition of ExecutionAttemptID and > > IntermediateResultPartitionID). These are further subpartitioned into > > ResultSubpartition instances. > > > > I propose to get the naming more in line with the existing naming schem= e > > and prefix it with the corresponding managemenet structures: > > > > 1) IntermediateDataSet =3D> JobVertexResult (identified by > JobVertexResultID) > > 2) IntermediateResult =3D> ExecutionJobVertexResult (identified by > > JobVertexResultID) > > 3) IntermediateResultPartition =3D> ExecutionVertexResult (identified b= y > > ExecutionVertexResultID) > > 4) ResultPartition =3D> Result > > 5) ResultSubpartition =3D> ResultPartition > > > > These names are non-user facing, but still at the core of the system. I > > think that consistent naming of these classes will make it easier for n= ew > > contributors to get an overview of how single components relate to each > > other (the prefixes indicate this). In the docs, we can still refer to > the > > high-level concept as "intermediate results". > > > > What's your opinion on this? I think now is a good time to think about > > this stuff, because the core classes have only been added recently to t= he > > system. Feel free to propose alternatives. :-) > > > > =E2=80=93 Ufuk > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+t= asks > --047d7bd75b344e0c8a051297ec59--