kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-4785) Records from internal repartitioning topics should always use RecordMetadataTimestampExtractor
Date Wed, 22 Feb 2017 02:20:44 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias J. Sax updated KAFKA-4785:
-----------------------------------
    Description: 
Users can specify what timestamp extractor should be used to decode the timestamp of input
topic records. As long as RecordMetadataTimestamp or WallclockTime is use this is fine. 

However, for custom timestamp extractors it might be invalid to apply this custom extractor
to records received from internal repartitioning topics. The reason is that Streams sets the
current "stream time" as record metadata timestamp explicitly before writing to intermediate
repartitioning topics because this timestamp should be use by downstream subtopologies. A
custom timestamp extractor might return something different breaking this assumption.

Thus, for reading data from intermediate repartitioning topic, the configured timestamp extractor
should not be used, but the record's metadata timestamp should be extracted as record timestamp.

In order to leverage the same behavior for intermediate user topic (ie, used in {{through()}})
 we can leverage KAFKA-4144 and internally set an extractor for those "intermediate sources"
that returns the record's metadata timestamp in order to overwrite the global extractor from
{{StreamsConfig}} (ie, set {{FailOnInvalidTimestampExtractor}}).

  was:
Users can specify what timestamp extractor should be used to decode the timestamp of input
topic records. As long as RecordMetadataTimestamp or WallclockTime is use this is fine. 

However, for custom timestamp extractors it might be invalid to apply this custom extractor
to records received from internal repartitioning topics. The reason is that Streams sets the
current "stream time" as record metadata timestamp explicitly before writing to intermediate
repartitioning topics because this timestamp should be use by downstream subtopologies. A
custom timestamp extractor might return something different breaking this assumption.

Thus, for reading data from intermediate repartitioning topic, the configured timestamp extractor
should not be used, but the record's metadata timestamp should be extracted as record timestamp.


> Records from internal repartitioning topics should always use RecordMetadataTimestampExtractor
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4785
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4785
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.0
>            Reporter: Matthias J. Sax
>
> Users can specify what timestamp extractor should be used to decode the timestamp of
input topic records. As long as RecordMetadataTimestamp or WallclockTime is use this is fine.

> However, for custom timestamp extractors it might be invalid to apply this custom extractor
to records received from internal repartitioning topics. The reason is that Streams sets the
current "stream time" as record metadata timestamp explicitly before writing to intermediate
repartitioning topics because this timestamp should be use by downstream subtopologies. A
custom timestamp extractor might return something different breaking this assumption.
> Thus, for reading data from intermediate repartitioning topic, the configured timestamp
extractor should not be used, but the record's metadata timestamp should be extracted as record
timestamp.
> In order to leverage the same behavior for intermediate user topic (ie, used in {{through()}})
 we can leverage KAFKA-4144 and internally set an extractor for those "intermediate sources"
that returns the record's metadata timestamp in order to overwrite the global extractor from
{{StreamsConfig}} (ie, set {{FailOnInvalidTimestampExtractor}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message