flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Garcia <ger...@talaia.io>
Subject Re: Timestamp synchronized message consumption across kafka partitions
Date Thu, 07 Mar 2019 18:27:52 GMT
I'll answer myself. I guess the most viable option for now is to wait for
the work in
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html

On Thu, Mar 7, 2019, 3:24 PM gerardg <gerard@talaia.io> wrote:

> I'm wondering if there is a way to avoid consuming too fast from partitions
> that not have as much data as the other ones in the same topic by keeping
> them more or less synchronized by its ingestion timestamp. Similar to what
> kafka streams does:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization
>
> We are having an issue where partitions with less data are consumed very
> fast which creates a lot of windows that can't be triggered until the
> partitions with more data are consumed and the watermark gets advanced. It
> seems that this issue should be quite common but we can't seem to find any
> standard solution to it. Maybe is just that our partitions are too
> unbalanced but still, without having a way to bound the skew between
> partition (for example when processing accumulated data) it seems like a
> potential source of problems.
>
> Anyone have an idea or suggestion to deal with this issue?
>
> Thanks,
>
> Gerard
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Mime
View raw message