Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CDEE17F23 for ; Thu, 19 Nov 2015 18:40:37 +0000 (UTC) Received: (qmail 68471 invoked by uid 500); 19 Nov 2015 18:40:37 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 68383 invoked by uid 500); 19 Nov 2015 18:40:37 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 68373 invoked by uid 99); 19 Nov 2015 18:40:36 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Nov 2015 18:40:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 821881A22F7 for ; Thu, 19 Nov 2015 18:40:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.007 X-Spam-Level: *** X-Spam-Status: No, score=3.007 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.008, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id V8gXi568Rveh for ; Thu, 19 Nov 2015 18:40:30 +0000 (UTC) Received: from mail-qg0-f54.google.com (mail-qg0-f54.google.com [209.85.192.54]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id B4A4B42AEE for ; Thu, 19 Nov 2015 18:40:29 +0000 (UTC) Received: by qgeb1 with SMTP id b1so57359163qge.1 for ; Thu, 19 Nov 2015 10:40:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=LGw7fTQXI3j0cDT0BrtO/y8DY/3/NSBH8ttGkpyvOFg=; b=Z0KDkLE4t+lGkYDU2MtTkPwsZfOnxvxzPh9CZioRfoDIEO8qP+345nqhdj6DIADvEF pqL4BoNQUU3NS7iGqnbrWCaaoUmlAc017h8MskMbXL+nuBNiGAD3zwsoBzx1Gn5gMCql YjiuQ3rXa1xNoYXB+xRkUeWwFk4lZTGoaO4NBFl9PSq6RVGMGV7ylxLTUwg2pTze0wl3 zFckBn5FR3pfr6xdviPbdUg5PtA3fYA5cm1KPksFYby0svFFIup0BOjAySRPGJfdQiTR P4+D9leGr7Hs0P1NzzfTAgyAnpduZH51SCD5H6+ilSaRQ2wqPezQXrh1QcUVxKHUm+G7 8m9w== MIME-Version: 1.0 X-Received: by 10.140.133.209 with SMTP id 200mr9259127qhf.0.1447958422987; Thu, 19 Nov 2015 10:40:22 -0800 (PST) Sender: ewenstephan@gmail.com Received: by 10.55.147.1 with HTTP; Thu, 19 Nov 2015 10:40:22 -0800 (PST) In-Reply-To: References: <023D74F2-5B15-4E34-A52C-C7262951EAB3@newrelic.com> Date: Thu, 19 Nov 2015 19:40:22 +0100 X-Google-Sender-Auth: T6kzF0Ysts6tqN-zmj-qSO4E-Nw Message-ID: Subject: Re: Fold vs Reduce in DataStream API From: Stephan Ewen To: user@flink.apache.org Content-Type: multipart/alternative; boundary=001a1135dfc273aa020524e917f0 --001a1135dfc273aa020524e917f0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Ron! You are right, there is a copy/paste error in the docs, it should be a FoldFunction that is passed to fold(), not a ReduceFunction. In Flink-0.10, the FoldFunction is only available on - KeyedStream ( https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/a= pache/flink/streaming/api/datastream/KeyedStream.html#fold(R,%20org.apache.= flink.api.common.functions.FoldFunction) ) - WindowedStream ( https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/a= pache/flink/streaming/api/datastream/WindowedStream.html#fold(R,%20org.apac= he.flink.api.common.functions.FoldFunction,%20org.apache.flink.api.common.t= ypeinfo.TypeInformation) ) In most cases, you probably want the variant on the WindowedStream, if you aggregate values over time. -------------------------------------------------------- To the difference between fold() and reduce(): It is very subtle. The fold function can also convert to another type whenever it integrates a new element. Here is an example (with lists, not streams, but same principle). -------------------------------------------------------- ReduceFunction { public Integer reduce(Integer a, Integer b) { return a + b; } } [1, 2, 3, 4, 5] -> reduce() means: ((((1 + 2) + 3) + 4) + 5) =3D 15 -------------------------------------------------------- FoldFunction { public String fold(String current, Integer i) { return current + String.valueOf(i); } } [1, 2, 3, 4, 5] -> fold("start-") means: ((((("start-" + 1) + 2) + 3) + 4) + 5) =3D "start-12345" (as a String) I hope that example illustrates the difference. Greetings, Stephan On Thu, Nov 19, 2015 at 7:00 PM, Ron Crocker wrote: > Hi Fabian - > > Thanks Fabian, that is a helpful description. > > That document WAS my source of information and it seems to also be the > source of my confusion. Further, it appears to be wrong - there is a > FoldFunction ( > https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org= /apache/flink/api/common/functions/FoldFunction.html) > that should be passed into fold()? > > Separate note: fold() doesn't appear in the javadocs for 0.10.0 DataStrea= m > (see > https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org= /apache/flink/streaming/api/datastream/DataStream.html). > So this made me look in the freshly-downloaded flink-streaming-java:0.10.= 0 > and fold() does not appear in org > .apache.flink.streaming.api.datastream.DataStream either. Am I looking in > the wrong place for it? In 0.9.1, it's located in that same class with th= is > signature: fold(R initialValue, FoldFunction folder). > > Ron > > On Wed, Nov 18, 2015 at 9:39 AM, Fabian Hueske wrote: > >> Hi Ron, >> >> Have you checked: >> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/stream= ing_guide.html#transformations >> ? >> >> Fold is like reduce, except that you define a start element (of a >> different type than the input type) and the result type is the type of t= he >> initial value. In reduce, the result type must be identical to the input >> type. >> >> Best, Fabian >> >> 2015-11-18 18:32 GMT+01:00 Ron Crocker : >> >>> Is there a succinct description of the distinction between these >>> transforms? >>> >> > -- > Ron Crocker > Principal Software Engineer > ( ( =E2=80=A2)) New Relic > rcrocker@newrelic.com > M: +1 630 363 8835 > --001a1135dfc273aa020524e917f0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Ron!

You are right, there is a copy/= paste error in the docs, it should be a FoldFunction that is passed to fold= (), not a ReduceFunction.


=
To the difference between fold() and reduce(): It is very subtle= . The fold function can also convert to another type whenever it integrates= a new element.

Here is an example (with lists, no= t streams, but same principle).

------------------= --------------------------------------

ReduceF= unction<Integer> {

=C2=A0 public Integer red= uce(Integer a, Integer b) { return a + b; }
}

[1, 2, 3, 4, 5] -> reduce() =C2=A0means: ((((1 + 2) + 3) + 4) + 5)= =3D 15

------------------------------------------= --------------

FoldFunction<String, In= teger> {

=C2=A0 public String fold(String curre= nt, Integer i) { return current + String.valueOf(i); }
}

[1, 2, 3, 4, 5] -> fold("start-") =C2= =A0means: ((((("start-" + 1) + 2) + 3) + 4) + 5) =3D "start-= 12345" (as a String)


I hop= e that example illustrates the difference.


Greetings,
Stephan


On Thu, Nov 19, 2015 at= 7:00 PM, Ron Crocker <rcrocker@newrelic.com> wrote:
=
Hi Fabian -

<= div>Thanks Fabian, that is a helpful description.=C2=A0

That document WAS my source of information and it seems to also be th= e source of my confusion. Further, it appears to be wrong - there is a Fold= Function<O,T> (https://ci.apache.org/projects/flink/flink-docs-= release-0.10/api/java/org/apache/flink/api/common/functions/FoldFunction.ht= ml) that should be passed into fold()? =C2=A0

<= div>Separate note: fold() doesn't appear in the javadocs for 0.10.0 Dat= aStream (see=C2=A0https://ci.apache.org/projects/flink/flink-docs-r= elease-0.10/api/java/org/apache/flink/streaming/api/datastream/DataStream.h= tml). So this made me look in the freshly-downloaded flink-streaming-ja= va:0.10.0 and fold() does not appear in=C2=A0org.apach= e.flink.streaming.api.datastream.DataStream=C2=A0either. Am I lookin= g in the wrong place for it? In 0.9.1, it's located in that same class = with this signature:=C2=A0fold(R initialValue, FoldFunction<OUT, R> folder).=C2=A0

Ron=C2=A0=

On Wed, Nov 18, 2015 at 9:39 AM, Fabian Hueske <fhueske@gmail.c= om> wrote:
Fold is like red= uce, except that you define a start element (of a different type than the i= nput type) and the result type is the type of the initial value. In reduce,= the result type must be identical to the input type.

Best, Fa= bian

= --

--001a1135dfc273aa020524e917f0--