Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B918C179F5 for ; Tue, 31 Mar 2015 15:27:14 +0000 (UTC) Received: (qmail 79667 invoked by uid 500); 31 Mar 2015 15:27:08 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 79610 invoked by uid 500); 31 Mar 2015 15:27:08 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 79599 invoked by uid 99); 31 Mar 2015 15:27:08 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 15:27:08 +0000 Received: from mail-qg0-f45.google.com (mail-qg0-f45.google.com [209.85.192.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id EFCC81A012D for ; Tue, 31 Mar 2015 15:27:07 +0000 (UTC) Received: by qgf60 with SMTP id 60so17436411qgf.3 for ; Tue, 31 Mar 2015 08:27:06 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.55.53.137 with SMTP id c131mr15887829qka.102.1427815626885; Tue, 31 Mar 2015 08:27:06 -0700 (PDT) Received: by 10.96.112.40 with HTTP; Tue, 31 Mar 2015 08:27:06 -0700 (PDT) In-Reply-To: References: Date: Tue, 31 Mar 2015 17:27:06 +0200 Message-ID: Subject: Re: [DISCUSS] Inconsistent naming of intermediate results From: Kostas Tzoumas To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a11476f563ed61d0512973bda --001a11476f563ed61d0512973bda Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I like the fact that the naming scheme follows some logic. I also like that we have two easy to understand concepts: - Operator (be that in any of the above representations) - Result (of executing an operator) +1 On Tue, Mar 31, 2015 at 4:50 PM, Ufuk Celebi wrote: > On a high level we call intermediate data produced by programs > "intermediate results". For example in a WordCount map-reduce program the > map function produces an intermediate result, which consists of (word, 1) > pairs and the reduce function consumes this intermediate result. Kostas h= as > recently added documentation explaining the core concepts [1]. > > The naming of classes related to intermediate results is inconsistent (an= d > probably confusing). > > - In JobGraphs (internal low-level API to define programs) they are calle= d > IntermediateDataSet and identified by IntermediateDataSetIDs. > > - In ExecutionGraphs (JobManager structure used for state > tracking/scheduling) they are called IntermediateResult at the > ExecutionJobVertex (identified by IntermediateDataSetID) and > IntermediateResultPartition at the ExecutionVertex (identified by > IntermediateResultPartitionID). > > - At runtime (TaskManager) they are called ResultPartition and identified > by ResultPartitionID (composition of ExecutionAttemptID and > IntermediateResultPartitionID). These are further subpartitioned into > ResultSubpartition instances. > > I propose to get the naming more in line with the existing naming scheme > and prefix it with the corresponding managemenet structures: > > 1) IntermediateDataSet =3D> JobVertexResult (identified by JobVertexResul= tID) > 2) IntermediateResult =3D> ExecutionJobVertexResult (identified by > JobVertexResultID) > 3) IntermediateResultPartition =3D> ExecutionVertexResult (identified by > ExecutionVertexResultID) > 4) ResultPartition =3D> Result > 5) ResultSubpartition =3D> ResultPartition > > These names are non-user facing, but still at the core of the system. I > think that consistent naming of these classes will make it easier for new > contributors to get an overview of how single components relate to each > other (the prefixes indicate this). In the docs, we can still refer to th= e > high-level concept as "intermediate results". > > What's your opinion on this? I think now is a good time to think about > this stuff, because the core classes have only been added recently to the > system. Feel free to propose alternatives. :-) > > =E2=80=93 Ufuk > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+t= asks --001a11476f563ed61d0512973bda--