flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Wysakowicz <wysakowicz.da...@gmail.com>
Subject Re: Issues with Event Time and Kafka
Date Tue, 07 Mar 2017 20:54:23 GMT
Hi Max,
How do you assign timestamps to your events (in event-time case)? Could you
post whole code for your TimestampAndWatermarkAssigner?

Regards,
Dawid

2017-03-07 20:59 GMT+01:00 ext.mwalker <ext.mwalker@riotgames.com>:

> Hi Stephan,
>
> The right number of events seem to leave the source and enter the windows,
> but it shows that 0 exit the windows.
>
> Also I have tried 30 minutes and not setting the watermark interval, I am
> not sure what I am supposed to put there the docs seem vague about that.
>
> Best,
>
> Max
>
> On Tue, Mar 7, 2017 at 1:54 PM, Stephan Ewen [via Apache Flink User
> Mailing List archive.] <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=12087&i=0>> wrote:
>
>> Hi!
>>
>> At a first glance, your code looks correct to assign the Watermarks. What
>> is your watermark interval in the config?
>>
>> Can you check with the Flink metrics (if you are using 1.2) to see how
>> many rows leave the source, how many enter/leave the window operators, etc?
>>
>> That should help figuring out why there are so few result rows...
>>
>> Stephan
>>
>>
>> On Mon, Mar 6, 2017 at 8:57 PM, ext.mwalker <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=12084&i=0>> wrote:
>>
>>> Hi Folks,
>>>
>>> We are working on a Flink job to proccess a large amount of data coming
>>> in
>>> from a Kafka stream.
>>>
>>> We selected Flink because the data is sometimes out of order or late,
>>> and we
>>> need to roll up the data into 30-minutes event time windows, after which
>>> we
>>> are writing it back out to an s3 bucket.
>>>
>>> We have hit a couple issues:
>>>
>>> 1) The job works fine using processing time, but when we switch to event
>>> time (almost) nothing seems to be written out.
>>> Our watermark code looks like this:
>>> ```
>>>   override def getCurrentWatermark(): Watermark = {
>>>     new Watermark(System.currentTimeMillis() - maxLateness);
>>>   }
>>> ```
>>> And we are doing this:
>>> ```
>>>     val env = StreamExecutionEnvironment.getExecutionEnvironment
>>>     env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>>> ```
>>> and this:
>>> ```
>>>     .assignTimestampsAndWatermarks(new
>>> TimestampAndWatermarkAssigner(Time.minutes(30).toMilliseconds))
>>> ```
>>>
>>> However even though we get millions of records per hour (the vast
>>> majority
>>> of which are no more that 30 minutes late) we get like 2 - 10 records per
>>> hour written out to the s3 bucket.
>>> We are using a custom BucketingFileSink Bucketer if folks believe that is
>>> the issue I would be happy to provide that code here as well.
>>>
>>> 2) On top of all this, we would really prefer to write the records
>>> directly
>>> to Aurora in RDS rather than to an intermediate s3 bucket, but it seems
>>> that
>>> the JDBC sink connector is unsupported / doesn't exist.
>>> If this is not the case we would love to know.
>>>
>>> Thanks in advance for all the help / insight on this,
>>>
>>> Max Walker
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-flink-user-maili
>>> ng-list-archive.2336050.n4.nabble.com/Issues-with-Event-Time
>>> -and-Kafka-tp12061.html
>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>> archive at Nabble.com.
>>>
>>
>>
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12084.html
>> To unsubscribe from Issues with Event Time and Kafka, click here.
>> NAML
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Issues with Event Time and Kafka
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12087.html>
>
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at
> Nabble.com.
>

Mime
View raw message