flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Should collect() and count() be treated as data sinks?
Date Thu, 02 Apr 2015 15:51:40 GMT
In my opinion it should not be handled like print. The idea behind
count()/collect() is that they immediately return the result which can
then be used in further flink operations.

Right now, this is not properly/efficiently implemented but once we
have support for intermediate results on this level they start making
more sense. Also, in such a case an execute would not be required
after a collect()/count() if only the result of that call is required.

On Thu, Apr 2, 2015 at 5:33 PM, Felix Neutatz <neutatz@googlemail.com> wrote:
> Hi,
> I have run the following program:
> final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
> List l = Arrays.asList(new Tuple1<Long>(1L));
> TypeInformation t = TypeInfoParser.parse("Tuple1<Long>");
> DataSet<Tuple1<Long>> data = env.fromCollection(l, t);
> long value = data.count();
> System.out.println(value);
> env.execute("example");
> Since there is no "real" data sink, I get the following:
> Exception in thread "main" java.lang.RuntimeException: No data sinks have
> been created yet. A program needs at least one sink that consumes data.
> Examples are writing the data set or printing it.
> In my opinion, we should handle count() and collect() like print().
> What do you think?
> Best regards,
> Felix

View raw message