flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Mason" <mason.c...@sony.com>
Subject Flink Kafka Connector Source Parallelism
Date Thu, 28 May 2020 06:08:13 GMT
Hi all,

I’m currently trying to understand Flink’s Kafka Connector and how parallelism affects
it. So, I am running the flink playground click count job and the parallelism is set to 2
by default.


However, I don’t see the 2nd subtask of the Kafka Connector sending any records: https://imgur.com/cA5ucSg.
Do I need to rebalance after reading from kafka?

```
clicks = clicks
   .keyBy(ClickEvent::getPage)
   .map(new BackpressureMap())
   .name("Backpressure");
```

`clicks` is the kafka click stream. From my reading in the operator docs, it seems counterintuitive
to do a `rebalance()` when I am already doing a `keyBy()`.

So, my questions:

1. How do I make use of the 2nd subtask?
2. Does the number of partitions have some sort of correspondence with the parallelism of
the source operator? If so, is there a general statement to be made about parallelism across
all source operators?

Thanks,
Mason
Mime
View raw message