flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis Gendronneau <a.gendronn...@gmail.com>
Subject Re: dataset flatmap with multiple output types
Date Fri, 27 May 2016 21:10:24 GMT
Hi Jon,

I'm pretty sure your input will be processed only once. I may be wrong (
correction needed if so ), but your pipeline should be compiled as :
source --> flatmap(words) -> result /sink
          |---> flatmap(chars) -> result /sink

As your input become streamed, each "line" goes through pipeline and is
processed by each flatmap. These flatmap will produce both a result and
send it to their sink. So your input is processed twice but at the same

hope that help

2016-05-27 22:33 GMT+02:00 Jon Stewart <JStewart@strozfriedberg.com>:

> Hello,
> I am new to Apache Flink, so my apologies if this is a common question. I
> have a rather complex operation I'd like to apply to an item in a data set.
> Conceptually, the operation could produce many types of each data, each one
> that I'd like to flow into a different result set.
> In Flink, it looks like the output of a flatMap operation must be of the
> same type, so I would need to split my processing up from a complex map
> operation to several to express the flow. For example, I might want to
> split a data set of text lines into words as well as individual characters:
> val lines: DataSet[String] = // lines of text
> val words = lines.flatMap { _.split(" ") }
> val chars = lines.flatMap { _.toCharArray() }
> Since "words" and "chars" in the example above have the same input DataSet
> and both have a flatMap operation applied to them, will "lines" only be
> iterated once and have both operations computed simultaneously? The big
> problem I have is that my objects are considerably heavier-weight than
> lines of text, so I really only want to iterate them once while performing
> multiple operations on them.
> Thank in advance,
> Jon

Alexis Gendronneau


View raw message