flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Mason" <mason.c...@sony.com>
Subject Re: Flink Kafka Connector Source Parallelism
Date Thu, 28 May 2020 06:23:03 GMT
I think I may have just answered my own question. There’s only one Kafka partition, so the
maximum parallelism is one and it doesn’t really make sense to make another kafka consumer
under the same group id. What threw me off is that there’s a 2nd subtask for the kafka source
created even though it’s not actually doing anything. So, it seems a general statement can
be made that (# kafka partitions) >= (# parallelism of flink kafka source)…well I guess
you could have more parallelism than kafka partitions, but the extra subtasks will not doing
anything.

From: "Chen, Mason" <mason.chen@sony.com>
Date: Wednesday, May 27, 2020 at 11:09 PM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Flink Kafka Connector Source Parallelism

Hi all,

I’m currently trying to understand Flink’s Kafka Connector and how parallelism affects
it. So, I am running the flink playground click count job and the parallelism is set to 2
by default.



However, I don’t see the 2nd subtask of the Kafka Connector sending any records: https://imgur.com/cA5ucSg.
Do I need to rebalance after reading from kafka?

```
clicks = clicks
   .keyBy(ClickEvent::getPage)
   .map(new BackpressureMap())
   .name("Backpressure");
```

`clicks` is the kafka click stream. From my reading in the operator docs, it seems counterintuitive
to do a `rebalance()` when I am already doing a `keyBy()`.

So, my questions:

1. How do I make use of the 2nd subtask?
2. Does the number of partitions have some sort of correspondence with the parallelism of
the source operator? If so, is there a general statement to be made about parallelism across
all source operators?

Thanks,
Mason
Mime
View raw message