flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Qin <qinnc...@gmail.com>
Subject Re: Increasing parallelism skews/increases overall job processing time linearly
Date Fri, 06 Jan 2017 18:27:09 GMT
Just noticed there are only two partitions per topic. Regardless of how large parallelism set.
Only two of those will get partition assigned at most.

Sent from my iPhone

> On Jan 6, 2017, at 02:40, Chakravarthy varaga <chakravarthyvp@gmail.com> wrote:
> Hi All,
>     Any updates on this?
> Best Regards
>> On Thu, Jan 5, 2017 at 1:21 PM, Chakravarthy varaga <chakravarthyvp@gmail.com>
>> Hi All,
>> I have a job as attached.
>> I have a 16 Core blade running RHEL 7. The taskmanager default number of slots is
set to 1. The source is a kafka stream and each of the 2 sources(topic) have 2 partitions
>> What I notice is that when I deploy a job to run with #parallelism=2 the total processing
time doubles the time it took when the same job was deployed with #parallelism=1. It linearly
increases with the parallelism.
>> Since the numberof slots is set to 1 per TM, I would assume that the job would be
processed in parallel in 2 different TMs and that each consumer in each TM is connected to
1 partition of the topic. This therefore should have kept the overall processing time the
same or less !!!
>> The co-flatmap connects the 2 streams & uses ValueState (checkpointed in FS).
I think this is distributed among the TMs. My understanding is that the search of values state
could be costly between TMs.  Do you sense something wrong here?
>> Best Regards
>> CVP

View raw message