flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Stewart <JStew...@StrozFriedberg.com>
Subject dataset flatmap with multiple output types
Date Fri, 27 May 2016 20:33:50 GMT
Hello,

I am new to Apache Flink, so my apologies if this is a common question. I have a rather complex
operation I'd like to apply to an item in a data set. Conceptually, the operation could produce
many types of each data, each one that I'd like to flow into a different result set.

In Flink, it looks like the output of a flatMap operation must be of the same type, so I would
need to split my processing up from a complex map operation to several to express the flow.
For example, I might want to split a data set of text lines into words as well as individual
characters:

val lines: DataSet[String] = // lines of text
val words = lines.flatMap { _.split(" ") }
val chars = lines.flatMap { _.toCharArray() }

Since "words" and "chars" in the example above have the same input DataSet and both have a
flatMap operation applied to them, will "lines" only be iterated once and have both operations
computed simultaneously? The big problem I have is that my objects are considerably heavier-weight
than lines of text, so I really only want to iterate them once while performing multiple operations
on them.

Thank in advance,

Jon

Mime
View raw message