flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: how to split data-sets efficiently?
Date Mon, 28 Jul 2014 14:07:01 GMT
Hey!

A similar issue has arisen in different context. We should solve both
problems homogeneously.

Can you participate in the discussion here:
https://issues.apache.org/jira/browse/FLINK-87

Greetings,
Stephan




On Mon, Jul 28, 2014 at 3:42 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi!
>
> "Splitting", in the sense that one function returns two different data
> sets, is currently not supported.
>
> I guess you have to go with Ufuk's suggestion. IN your case, I guess it
> would look somewhat like this:
>
>
> DataSet<Tuple2<Long, String>> mapped = ogiginalStrings.map(HashIdMapper());
>
> DataSet<Long> ids = mapped.map(new ProjectTo2());
>
> DataSet<Long> result = ids.runTheGraphAlgorithm(...)
>
> result.join(mapped).where(...).equalTo(...).with(new MapBackToStrings());
>
>
> Greetings,
> Stephan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message