flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Fold vs Reduce in DataStream API
Date Thu, 19 Nov 2015 18:40:22 GMT
Hi Ron!

You are right, there is a copy/paste error in the docs, it should be a
FoldFunction that is passed to fold(), not a ReduceFunction.

In Flink-0.10, the FoldFunction is only available on

  - KeyedStream (
https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/streaming/api/datastream/KeyedStream.html#fold(R,%20org.apache.flink.api.common.functions.FoldFunction)
)

  - WindowedStream (
https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/streaming/api/datastream/WindowedStream.html#fold(R,%20org.apache.flink.api.common.functions.FoldFunction,%20org.apache.flink.api.common.typeinfo.TypeInformation)
)

In most cases, you probably want the variant on the WindowedStream, if you
aggregate values over time.

--------------------------------------------------------

To the difference between fold() and reduce(): It is very subtle. The fold
function can also convert to another type whenever it integrates a new
element.

Here is an example (with lists, not streams, but same principle).

--------------------------------------------------------

ReduceFunction<Integer> {

  public Integer reduce(Integer a, Integer b) { return a + b; }
}

[1, 2, 3, 4, 5] -> reduce()  means: ((((1 + 2) + 3) + 4) + 5) = 15

--------------------------------------------------------

FoldFunction<String, Integer> {

  public String fold(String current, Integer i) { return current +
String.valueOf(i); }
}

[1, 2, 3, 4, 5] -> fold("start-")  means: ((((("start-" + 1) + 2) + 3) + 4)
+ 5) = "start-12345" (as a String)


I hope that example illustrates the difference.


Greetings,
Stephan


On Thu, Nov 19, 2015 at 7:00 PM, Ron Crocker <rcrocker@newrelic.com> wrote:

> Hi Fabian -
>
> Thanks Fabian, that is a helpful description.
>
> That document WAS my source of information and it seems to also be the
> source of my confusion. Further, it appears to be wrong - there is a
> FoldFunction<O,T> (
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/functions/FoldFunction.html)
> that should be passed into fold()?
>
> Separate note: fold() doesn't appear in the javadocs for 0.10.0 DataStream
> (see
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/streaming/api/datastream/DataStream.html).
> So this made me look in the freshly-downloaded flink-streaming-java:0.10.0
> and fold() does not appear in org
> .apache.flink.streaming.api.datastream.DataStream either. Am I looking in
> the wrong place for it? In 0.9.1, it's located in that same class with this
> signature: fold(R initialValue, FoldFunction<OUT, R> folder).
>
> Ron
>
> On Wed, Nov 18, 2015 at 9:39 AM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> Hi Ron,
>>
>> Have you checked:
>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#transformations
>> ?
>>
>> Fold is like reduce, except that you define a start element (of a
>> different type than the input type) and the result type is the type of the
>> initial value. In reduce, the result type must be identical to the input
>> type.
>>
>> Best, Fabian
>>
>> 2015-11-18 18:32 GMT+01:00 Ron Crocker <rcrocker@newrelic.com>:
>>
>>> Is there a succinct description of the distinction between these
>>> transforms?
>>>
>>
> --
> Ron Crocker
> Principal Software Engineer
> ( ( •)) New Relic
> rcrocker@newrelic.com
> M: +1 630 363 8835
>

Mime
View raw message