kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael D. Coon" <mdco...@yahoo.com.INVALID>
Subject Re: KStreams Rewind Offset
Date Thu, 02 Jun 2016 13:55:20 GMT
   That's disappointing given that Kafka offers me the ability to rewind and replay data.
My use case is that we are building graph data structures based on data indexed from a live
stream. At any time, the live data content may be marked for deletion for any number of reasons;
but during that marking process if a graph structure is being built, it may not realize the
data was marked for deletion (i.e. there is a race between graph referencing the data and
the data being removed). 

   We need to be able to subsequently go back and clean up the graph data once we realize
the graph contains data that was marked for deletion. But we can't delete/cleanup the graph
until it completes...so we thought we could track all data referenced by the graph being created
and once it was complete, subsequently replay the data references and determine if any were
marked for removal and subsequently clean up the graph. We hoped that by sending "start/end"
indicators into a graph data reference topic, some KStreams flow could see the "end", recognize
that the graph completed, and simply replay all its data references to cleanup the graph.
I guess we could use a standard consumer and do this outside of KStreams. Not a big deal...was
just hoping to keep things in the KStreams realm. I'm sure there are other ways to solve this
even outside of using Kafka at all; but why do that? :)


    On Thursday, June 2, 2016 8:59 AM, Matthias J. Sax <matthias@confluent.io> wrote:

 Hi Mike,

currently, this is not possible. We are already discussing some changes
with regard to reprocess. However, I doubt that going back to a specific
offset of a specific partition will be supported as it would be too
difficult to reset the internal data structures and intermediate results
correctly (also with regard to committing)

What is your exact use case? What kind of feature are you looking for?
We are always interested to get feedback/idea from users.


On 06/01/2016 08:21 PM, Michael D. Coon wrote:
> All,
>  I think it's great that the ProcessorContext offers the partition and offset of the
current record being processed; however, it offers no way for me to actually use the information.
I would like to be able to rewind to a particular offset on a partition if I needed to. The
consumer is also not exposed to me so I couldn't access things directly that way either. Is
this in the works or would it interfere with rebalancing/auto-commits?
> Mike

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message