kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: [DISCUSS] KIP-228 Negative record timestamp support
Date Thu, 28 Dec 2017 20:58:19 GMT
I agree that changing message format or using a flag bit might not be
worth it.

However, just keeping -1 as "unknown" leaving a time gap give me a lot
of headache, too. Your arguments about "not an issue in practice" kinda
make sense to me, but I see the number of question on the mailing list
already if we really follow this path... It will confuse users that
don't pay attention and "loose" data if Kafka Streams drops records with
timestamp -1 but processes other records with negative timestamps.

Thus, I was wondering if a new topic config (maybe
`allow.negative.timestamps` with default `false`) that allows for enable
negative timestamps would be the better solution? With this new config,
we would not have any sentinel value for "unknown" and all timestamps
would be valid. Old producers, can't write to those topics if they are
configured with CREATE_TIME though; APPEND_TIME would still work for
older producers but with APPEND_TIME no negative timestamps are possible
in the first place, so this config would not have any impact anyway.

Kafka Streams could check the topic config and only drop negative
timestamps is they are not enabled. Or course, existing topic should not
enable negative timestamps if there are records with -1 in them already
-- otherwise, semantics break down -- but this would be a config error
we cannot prevent. However, I would expect that mostly newly created
topics would enable this config anyway.


-Matthias

On 12/18/17 10:47 PM, Ewen Cheslack-Postava wrote:
> I think the trivial change of just recognizing using -1 was a mistake for a
> sentinel value and special casing it while allowing other negative values
> through is the most practical, reasonable change.
> 
> Realistically, the scope of impact for that -1 is pretty tiny, as has been
> pointed out. A single millisecond gap in available timestamps in 1969. For
> producers that really want to be careful (as the NYT data might want to
> be), having the producer layer adjust accordingly is unlikely to be an
> issue (you can't assume these timestamps are unique anyway, so they cannot
> reasonably used for ordering; adjusting by 1ms is a practical tradeoff).
> 
> Other approaches where we modify the semantics of the timestamp from the
> two existing modes require eating up valuable flags in the message format,
> or ramping the message format version, all of which make things
> significantly messier. Hell, timezones, leap seconds, and ms granularity
> probably make that 1ms window pretty much moot for any practical
> applications, and for the extremely rare case that an application might
> care, they are probably willing to pay the cost of a secondary index if
> they needed to store timestamp values in the payload rather than in the
> metadata.
> 
> Given that we have the current system in place, I suspect that any
> translation to using Long.MIN_VALUE as the sentinel is probably just more
> confusing to users, adds more implementation overhead to client libraries,
> and is more likely to introduce bugs.
> 
> Warts like these always feel wrong when approached from pure design
> principles, but the fact is that the constraints are already there. To me,
> none of the proposals to move to an encoding we'd prefer seem to add enough
> value to outweigh the migration, compatibility, and implementation costs.
> 
> @Dong -- your point about special timestamp values is a very good one. The
> issue may extend to other cases in the protocol where we use timestamps. Is
> this the scope we need to worry about (2 values instead of just 1) or are
> there others? This also might be something we want to look out for in the
> future -- using special values relative to <SignedIntType>.MIN_VALUE
> instead of relative to 0.
> 
> -Ewen
> 
> On Tue, Dec 12, 2017 at 11:12 AM, Dong Lin <lindong28@gmail.com> wrote:
> 
>> Hey Konstantin,
>>
>> Thanks for updating the KIP.
>>
>> If we were to support negative timestamp in the message, we probably also
>> want to support negative timestamp in ListOffsetRequest. Currently in
>> ListOffsetRequest, timestamp value -2 is used to indicate earliest
>> timestamp and timestamp value -1 is used to indicate latest timestamp. It
>> seems that we should make changes accordingly so that -1 and -2 can be
>> supported as valid timestamp in ListOffsetRequest. What do you think?
>>
>> Thanks,
>> Dong
>>
>>
>>
>> On Mon, Dec 11, 2017 at 12:55 PM, Konstantin Chukhlomin <
>> chuhlomin@gmail.com
>>> wrote:
>>
>>> Hi all,
>>>
>>> I've updated KIP with few more details:
>>> Added (proposed) Changes in binary message format <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-228+
>>> Negative+record+timestamp+support#KIP-228Negativerecordtimes
>>> tampsupport-Changesinbinarymessageformat>
>>> Added Changes from producer perspective <https://cwiki.apache.org/conf
>>> luence/display/KAFKA/KIP-228+Negative+record+timestamp+supp
>>> ort#KIP-228Negativerecordtimestampsupport-Changesfromproducerperspective
>>>
>>> Added Changes from consumer perspective <https://cwiki.apache.org/conf
>>> luence/display/KAFKA/KIP-228+Negative+record+timestamp+supp
>>> ort#KIP-228Negativerecordtimestampsupport-Changesfromconsumerperspective
>>>
>>>
>>> Let me know if it makes sense to you.
>>>
>>> -Konstantin
>>>
>>>> On Dec 7, 2017, at 2:46 PM, Konstantin Chukhlomin <chuhlomin@gmail.com
>>>
>>> wrote:
>>>>
>>>> Hi Matthias,
>>>>
>>>> Indeed for consumers it will be not obvious what −1 means: actual
>>> timestamp
>>>> or no timestamp. Nevertheless, it's just −1 millisecond, so I thought
>> it
>>> will be
>>>> not a big deal to leave it (not clean, but acceptable).
>>>>
>>>> I agree that it will much cleaner to have a different type of topics
>>> that support
>>>> negative timestamp and/or threat Long.MIN_VALUE as a no-timestamp.
>>>> I'll update KIP to make it a proposed solution.
>>>>
>>>> Thanks,
>>>> Konstantin
>>>>
>>>>> On Dec 5, 2017, at 7:06 PM, Matthias J. Sax <matthias@confluent.io>
>>> wrote:
>>>>>
>>>>> Thanks for the KIP Konstantin.
>>>>>
>>>>> From my understanding, you propose to just remove the negative
>> timestamp
>>>>> check in KafkaProducer and KafkaStreams. If topics are configured with
>>>>> `CreateTime` brokers also write negative timestamps if they are
>> embedded
>>>>> in the message.
>>>>>
>>>>> However, I am not sure about the overlapping semantics for -1
>> timestamp.
>>>>> My concerns is, that this ambiguity might result in issues. Assume
>> that
>>>>> there is a topic (configured with `CreateTime`) for which an old and
a
>>>>> new producer are writing. The old producer uses old message format and
>>>>> does not include any timestamp in the message. The broker will
>> "upgrade"
>>>>> this message to the new format and set -1. At the same time, the new
>>>>> producer could write a message with valid timestamp -1. A consumer
>> could
>>>>> not distinguish between both cases...
>>>>>
>>>>> Also, there might be other Producer implementations that write
>> negative
>>>>> timestamps. Thus, those might already exist. For Streams, we don't
>>>>> process those and we should make sure to keep it this way (to avoid
>>>>> ambiguity).
>>>>>
>>>>> Thus, it might actually make sense to introduce a new timestamp type
>> to
>>>>> express those new semantics. The question is still, how to deal with
>>>>> older producer clients that want to write to those topics.
>>>>>
>>>>> - We could either use `Long.MIN_VALUE` as "unknown" (this would be way
>>>>> better than -1 as it's not in the middle of the range but at the very
>>>>> end and it will also have well-defined semantics).
>>>>> - Or we use a "mixed-mode" where we use broker wall-clock time for
>>>>> older message formats (ie, append time semantics for older producers)
>>>>> - Third, we would even give an error message back to older producers;
>>>>> this might change the backward compatibility guarantees Kafka provides
>>>>> so far when upgrading brokers. However, this would not affect exiting
>>>>> topics, but only newly created ones (and we could disallow changing
>> the
>>>>> semantics to the new timestamp type to guard against miss
>>>>> configuration). Thus, it might be ok.
>>>>>
>>>>> For Streams, we could check the topic config and process negative
>>>>> timestamps only if the topic is configures with the new timestamp
>> type.
>>>>>
>>>>>
>>>>> Maybe I am a little bit to paranoid about overloading -1 semantics.
>>>>> Curious to get feedback from others.
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 12/5/17 1:24 PM, Konstantin Chukhlomin wrote:
>>>>>> Hi Dong,
>>>>>>
>>>>>> Currently we are storing historical timestamp in the message.
>>>>>>
>>>>>> What we are trying to achieve is to make it possible to do Kafka
>> lookup
>>>>>> by timestamp. Ideally I would do `offsetsForTimes` to find articles
>>> published
>>>>>> in 1910s (if we are storing articles on the log).
>>>>>>
>>>>>> So first two suggestions aren't really covering our use-case.
>>>>>>
>>>>>> We could create a new timestamp type like "HistoricalTimestamp" or
>>> "MaybeNegativeTimestamp".
>>>>>> And the only difference between this one and CreateTime is that it
>>> could be negative.
>>>>>> I tend to use CreateTime for this purpose because it's easier to
>>> understand from
>>>>>> user perspective as a timestamp which publisher can set.
>>>>>>
>>>>>> Thanks,
>>>>>> Konstantin
>>>>>>
>>>>>>> On Dec 5, 2017, at 3:47 PM, Dong Lin <lindong28@gmail.com>
wrote:
>>>>>>>
>>>>>>> Hey Konstantin,
>>>>>>>
>>>>>>> Thanks for the KIP. I have a few questions below.
>>>>>>>
>>>>>>> Strictly speaking Kafka actually allows you to store historical
>> data.
>>> And
>>>>>>> user are free to encode arbitrary timestamp field in their Kafka
>>> message.
>>>>>>> For example, your Kafka message can currently have Json or Avro
>>> format and
>>>>>>> you can put a timestamp field there. Do you think that could
address
>>> your
>>>>>>> use-case?
>>>>>>>
>>>>>>> Alternatively, KIP-82 introduced Record Header in Kafka and you
can
>>> also
>>>>>>> define your customized key/value pair in the header. Do you think
>>> this can
>>>>>>> address your use-case?
>>>>>>>
>>>>>>> Also, currently there are two types of timestamp according to
>> KIP-32.
>>> If
>>>>>>> the type is LogAppendTime then the timestamp value is the time
when
>>> broker
>>>>>>> receives the message. If the type is CreateTime then the timestamp
>>> value is
>>>>>>> determined when producer produces message. With these two
>>> definitions, the
>>>>>>> timestamp should always be positive. We probably need a new type
>> here
>>> if we
>>>>>>> can not put timestamp in the Record Header or the message payload.
>>> Does
>>>>>>> this sound reasonable?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dong
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 5, 2017 at 8:40 AM, Konstantin Chukhlomin <
>>> chuhlomin@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have created a KIP to support negative timestamp:
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>> 228+Negative+record+timestamp+support <https://cwiki.apache.org/
>>>>>>>> confluence/display/KAFKA/KIP-228+Negative+record+timestamp+
>> support>
>>>>>>>>
>>>>>>>> Here are proposed changes: https://github.com/apache/
>>>>>>>> kafka/compare/trunk...chuhlomin:trunk <https://github.com/apache/
>>>>>>>> kafka/compare/trunk...chuhlomin:trunk>
>>>>>>>>
>>>>>>>> I'm pretty sure that not cases are covered, so comments and
>>> suggestions
>>>>>>>> are welcome.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Konstantin
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 


Mime
View raw message