flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paschek, Robert" <robert.pasc...@tu-berlin.de>
Subject Serialization of "not a valid POJO type"
Date Sat, 30 Jul 2016 10:04:25 GMT
Hi Mailing List,

according to my questions (and your answers!) at this topic
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Performance-issues-with-GroupBy-td8130.html

I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial
results. My mapper has the following layout:

ArrayList<ArrayList<Tuple>  structure = ...

For (Tuple tuple : input) {
                addTupleToStructure()
}
While(WorkNotDone) {
                doSomeWorkOnStructure()
                emitPartialResult();
}

Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type")
I do now iterate through this ArrayList<T> and emit each Tuple as Tuple2.of(Integer.valueOf(this.partitionIndex),
tuple)));
While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked
and can't continue to doSomeWorkOnStructure().

So I have three questions:

-          If I change back to emitting the ArrayList<T> would my Mapper also be blocked
until Flink has serialized this ArrayList<T>? Or is Serialization done independent from
my Mapper?



-          If emitting the ArrayList<T> won't block my Mapper, which variant would be
more performant?


-          If I emit ArrayList<T>, but additionally implement a combiner, which

o   Merges all local ArrayLists<T> with the same partitionIndex

o   Iterates through the local-merged ArrayLists<T> and emits the containing tuples
would that be the best variant? Because the combining is done locally, I would assume that
no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not
blocked with emitting tuples and can continue doSomeWorkOnStructure()

Thank you in advance!
Robert




Mime
View raw message