flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eron Wright <ewri...@live.com>
Subject Side-effects of DataSet::count
Date Sun, 29 May 2016 21:28:20 GMT
I was curious as to how the `count` method on DataSet worked, and was surprised to see that
it executes the entire program graph.   Wouldn’t this cause undesirable side-effects like
writing to sinks?    Also strange that the graph is mutated with the addition of a sink (that
isn’t subsequently removed).

Surveying the Flink code, there aren’t many situations where the program graph is implicitly
executed (`collect` is another).   Nonetheless, this has deepened my appreciation for how
dynamic the application might be.

// DataSet.java
public long count() throws Exception {
   final String id = new AbstractID().toString();

   output(new Utils.CountHelper<T>(id)).name("count()");

   JobExecutionResult res = getExecutionEnvironment().execute();
   return res.<Long> getAccumulatorResult(id);
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message