flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: intermediate result reuse
Date Sat, 12 Sep 2015 14:27:17 GMT
Fabian has explained it well. All functions are executed lazily as one DAG,
when "env.execute()" is called.

Beware that there are three exceptions:
  - count()
  - collect()
  - print()

These functions trigger an immediate program execution (they are "eager"
functions). They will execute all that is needed for produce their result.
Summing up:

---------------------------

One execution in this case (result "a" is reused by "b" and "c")

a = env.createInput() -> map() -> reduce() -> filter()

b = a.flatmap()
c = a.groupBy() -> reduce()

b.writeAsText()
c.writeAsCsv()

env.execute();

---------------------------

Two executions in this case ("a" is computed twice, once for "b" and once
for "c")

a = env.createInput() -> map() -> reduce() -> filter()

b = a -> flatmap() -> count()
c = a -> groupBy() -> reduce().collect()

---------------------------

Greetings,
Stephan


On Sat, Sep 12, 2015 at 11:31 AM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Michele,
>
> Flink programs can have multiple sinks.
> In your program, the intermediate result a will be streamed to both
> filters (b and c) at the same time and both sinks will be written at the
> same time.
> So in this case, there is no need to materialize the intermediate result a.
>
> If you call execute() after you defined b, the program will compute a and
> stream the result only to b.
> If you call execute() again after you defined c, the program will compute
> a again and stream the result to c.
>
> Summary:
> Flink programs can usually stream intermediate results without
> materializing them. There are a few cases where it needs to materialize
> intermediate results in order to avoid deadlocks, but these are fully
> transparently handled.
> It is not possible (yet!) to share results across program executions,
> i.e., whenever you call execute().
>
> I suppose, you call execute() between defining b and c. If you execute
> that call, a will be computed once and both b and c are computed at the
> same time.
>
> Best, Fabian
>
> 2015-09-12 11:02 GMT+02:00 Michele Bertoni <
> michele1.bertoni@mail.polimi.it>:
>
>> Hi everybody, I have a question about internal optimization
>> is flink able to reuse intermediate result that are used twice in the
>> graph?
>>
>> i.e.
>> a = readsource -> filter -> reduce -> something else even more complicated
>>
>> b = a filter(something)
>> store b
>>
>> c = a filter(something else)
>> store c
>>
>> what happens to a? is it computed twice?
>>
>> in my read function I have a some logging commands and I see the printed
>> twice, but it sounds strange to me
>>
>>
>>
>> thanks
>> cheers
>> michele
>
>
>

Mime
View raw message