kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Bellemare <adam.bellem...@gmail.com>
Subject Re: Synchronized consumption + processing based on timestamps?
Date Wed, 03 Jul 2019 11:39:35 GMT
Thanks Matthias,

For #1 this is something we have done already, but it can obviously take a
while to catch up for large topics (or may not catch up at all if the
volume is large).

For #2, this is what I am wondering about, if anyone knows of any
implementations that provide an option like this under the hood of the
consumer, some sort of SynchronizedConsumer. I can see doing it on a
per-client basis, but it would be great to abstract it out into a general
approach.

I'll think about this one more.

Thanks



On Tue, Jul 2, 2019 at 11:27 PM Matthias J. Sax <matthias@confluent.io>
wrote:

> I think you can only achieve this, if
>
> 1) you don't use two clients, but only one client that reads both
> partitions
>
> or
>
> 2) let both clients exchange data about their time progress
>
>
> -Matthias
>
>
> On 7/2/19 6:01 PM, Adam Bellemare wrote:
> > Hi All
> >
> > The use-case is pretty simple. Lets say we have a history of events with
> > the following:
> > key=userId, value = (timestamp, productId)
> >
> > and we want to remap it to (just as we would with an internal topic):
> > key=productId, value=(original_timestamp, userId)
> >
> > Now, say I have 30 days of backlog, and 2 partitions for the input
> topic. I
> > spin up two instances and let them process the data from the start of
> time,
> > but one instance is only half as powerful (less CPU, Mem, etc), such that
> > instance 0 processes X events / sec which instance 1 processes x/2 events
> > /sec.
> >
> > My question is: Are there *any* clients, kafka streams, spark, flink, etc
> > or otherwise, that would allow these two consumers to remain in sync
> *according
> > to their timestamps*? I don't want to see events with original_timestamp
> of
> > today (from instance 0) interleaved with events from 15 days ago (from
> the
> > underpowered instance 1). Yes, I do realize this would bring my
> throughput
> > down, but I am looking for any existing tech that would effectively say
> *"cap
> > the time difference of events coming out of this repartition processor at
> > 60 seconds max"*
> >
> > Currently, I am not aware of ANY open source solutions that do this for
> > Kafka, but if someone has heard otherwise I would love to know.
> > Alternately, perhaps this could lead to a KIP.
> >
> > Thanks
> > Adam
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message