flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chakravarthy varaga <chakravarth...@gmail.com>
Subject Increasing parallelism skews/increases overall job processing time linearly
Date Thu, 05 Jan 2017 13:21:37 GMT
Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of
slots is set to 1. The source is a kafka stream and each of the 2
sources(topic) have 2 partitions each.


*What I notice is that when I deploy a job to run with #parallelism=2 the
total processing time doubles the time it took when the same job was
deployed with #parallelism=1. It linearly increases with the parallelism.*
Since the numberof slots is set to 1 per TM, I would assume that the job
would be processed in parallel in 2 different TMs and that each consumer in
each TM is connected to 1 partition of the topic. This therefore should
have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in
FS). I think this is distributed among the TMs. My understanding is that
the search of values state could be costly between TMs.  Do you sense
something wrong here?

Best Regards
CVP

Mime
View raw message