flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paschek, Robert" <robert.pasc...@tu-berlin.de>
Subject AW: Serialization of "not a valid POJO type"
Date Sat, 30 Jul 2016 16:16:58 GMT
Hi again,

I implemented the scenario with the combiner and answered my 3rd question by myself:
The combiners start after the mapper finished, so the reducer will not start processing partial
results until the mappers are completely done.

Regards
Robert


Von: Paschek, Robert [mailto:robert.paschek@tu-berlin.de]
Gesendet: Samstag, 30. Juli 2016 12:04
An: user@flink.apache.org
Betreff: Serialization of "not a valid POJO type"

Hi Mailing List,

according to my questions (and your answers!) at this topic
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Performance-issues-with-GroupBy-td8130.html

I have eliminated my ArrayList<T> in my collect methods. Additional I want to emit partial
results. My mapper has the following layout:

ArrayList<ArrayList<Tuple>  structure = ...

For (Tuple tuple : input) {
                addTupleToStructure()
}
While(WorkNotDone) {
                doSomeWorkOnStructure()
                emitPartialResult();
}

Instead of emitting the partial result as an ArrayList<T> ("not a valid POJO type")
I do now iterate through this ArrayList<T> and emit each Tuple as Tuple2.of(Integer.valueOf(this.partitionIndex),
tuple)));
While iterating through this ArrayList and emitting tuples, my mapper seems to be blocked
and can't continue to doSomeWorkOnStructure().

So I have three questions:

-          If I change back to emitting the ArrayList<T> would my Mapper also be blocked
until Flink has serialized this ArrayList<T>? Or is Serialization done independent from
my Mapper?



-          If emitting the ArrayList<T> won't block my Mapper, which variant would be
more performant?


-          If I emit ArrayList<T>, but additionally implement a combiner, which

o   Merges all local ArrayLists<T> with the same partitionIndex

o   Iterates through the local-merged ArrayLists<T> and emits the containing tuples
would that be the best variant? Because the combining is done locally, I would assume that
no Serialization is required between Mapper and Combiner. Also, the Mapper is probably not
blocked with emitting tuples and can continue doSomeWorkOnStructure()

Thank you in advance!
Robert




Mime
View raw message