kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <...@confluent.io>
Subject Re: [DISCUSS] KIP-79 - ListOffsetRequest v1 and offsetForTime() method in new consumer.
Date Fri, 02 Sep 2016 21:11:25 GMT
This looks great, big improvements for the list offset protocol which is
currently quite odd.

One minor thing. I think the old v0 list offsets request also gave you the
highwater mark, it kind of shoves it in as the last thing in the array of
offsets. This is used internally to implement seekToEnd() iirc. How would
that work once v0 is removed?

Related, the wiki says:
"Another related feature missing in KafkaConsumer is the access of
partitions' high watermark. Typically, users only need the high watermark
in order to get the per partition lag. This seems more suitable to be
exposed through the metrics."

The obvious usage is computing lag for sure, and I agree that is really
more a metric than anything else, but I think that is not the only usage.
Here is a use case I think is quite important that requires knowing the
highwater mark:

Say you want to implement some kind of batch process that wakes up every 5
minutes or every hour or once a day and processes all the messages and then
goes back to sleep. The naive way to do that would be to poll() until you
don't get any more records, but this is broken in two minor ways, first
maybe you didn't get records because you are rebalancing and second this
might never happen if new records are always getting written. A better
approach is for your process, when it begins, to look at the current end of
the log and process only up to that offset.

This is important for Kafka Streams or anything else that wants to have a
kind of batch-like mode.

Technically you can do this by seeking to the end, checking your position,
then starting over, as people do today. But I think we can agree that is
kind of silly.

An alternative would be to rename TimestampOffset to something like
PartitionOffsets and have it have both the timestamp and offset as well as
the beginning offset and highwatermark for the partition. The underlying
protocol would need these two.

Cheers,

-Jay

On Tue, Aug 30, 2016 at 8:38 PM, Becket Qin <becket.qin@gmail.com> wrote:

> Hi Kafka devs,
>
> I created KIP-79 to allow consumer to precisely query the offsets based on
> timestamp.
>
> In short we propose to :
> 1. add a ListOffsetRequest/ListOffsetResponse v1, and
> 2. add an offsetForTime() method in new consumer.
>
> The KIP wiki is the following:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868090
>
> Comments are welcome.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message