beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł Kaczmarczyk (JIRA) <j...@apache.org>
Subject [jira] [Created] (BEAM-2467) KinesisIO watermark based on approximateArrivalTimestamp
Date Mon, 19 Jun 2017 12:42:00 GMT
Paweł Kaczmarczyk created BEAM-2467:
---------------------------------------

             Summary: KinesisIO watermark based on approximateArrivalTimestamp
                 Key: BEAM-2467
                 URL: https://issues.apache.org/jira/browse/BEAM-2467
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-extensions
            Reporter: Paweł Kaczmarczyk
            Assignee: Davor Bonaci


In Kinesis we can start reading the stream at some point in the past during the retention
period (up to 7 days). With current approach for setting record's timestamp and watermark
(both are always set to current time, i.e. Instant.now()), we can't observe the actual position
in the stream.

So the idea is to change this behaviour and set the record timestamp based on the [ApproximateArrivalTimestamp|http://docs.aws.amazon.com/kinesis/latest/APIReference/API_Record.html#Streams-Type-Record-ApproximateArrivalTimestamp].
Watermark will be set accordingly to the last read record's timestamp. 

ApproximateArrivalTimestamp is still some approximation and may result in having records with
out-of-order timestamp's which in turn may result in some events marked as late. This however
should not be a frequent issue and even if it happens it should be a matter of milliseconds
or seconds so can be handled even with a tiny allowedLateness setting



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message