flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ext.mwalker" <ext.mwal...@riotgames.com>
Subject Re: Issues with Event Time and Kafka
Date Tue, 07 Mar 2017 19:59:13 GMT
Hi Stephan,

The right number of events seem to leave the source and enter the windows,
but it shows that 0 exit the windows.

Also I have tried 30 minutes and not setting the watermark interval, I am
not sure what I am supposed to put there the docs seem vague about that.

Best,

Max

On Tue, Mar 7, 2017 at 1:54 PM, Stephan Ewen [via Apache Flink User Mailing
List archive.] <ml-node+s2336050n12084h6@n4.nabble.com> wrote:

> Hi!
>
> At a first glance, your code looks correct to assign the Watermarks. What
> is your watermark interval in the config?
>
> Can you check with the Flink metrics (if you are using 1.2) to see how
> many rows leave the source, how many enter/leave the window operators, etc?
>
> That should help figuring out why there are so few result rows...
>
> Stephan
>
>
> On Mon, Mar 6, 2017 at 8:57 PM, ext.mwalker <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=12084&i=0>> wrote:
>
>> Hi Folks,
>>
>> We are working on a Flink job to proccess a large amount of data coming in
>> from a Kafka stream.
>>
>> We selected Flink because the data is sometimes out of order or late, and
>> we
>> need to roll up the data into 30-minutes event time windows, after which
>> we
>> are writing it back out to an s3 bucket.
>>
>> We have hit a couple issues:
>>
>> 1) The job works fine using processing time, but when we switch to event
>> time (almost) nothing seems to be written out.
>> Our watermark code looks like this:
>> ```
>>   override def getCurrentWatermark(): Watermark = {
>>     new Watermark(System.currentTimeMillis() - maxLateness);
>>   }
>> ```
>> And we are doing this:
>> ```
>>     val env = StreamExecutionEnvironment.getExecutionEnvironment
>>     env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>> ```
>> and this:
>> ```
>>     .assignTimestampsAndWatermarks(new
>> TimestampAndWatermarkAssigner(Time.minutes(30).toMilliseconds))
>> ```
>>
>> However even though we get millions of records per hour (the vast majority
>> of which are no more that 30 minutes late) we get like 2 - 10 records per
>> hour written out to the s3 bucket.
>> We are using a custom BucketingFileSink Bucketer if folks believe that is
>> the issue I would be happy to provide that code here as well.
>>
>> 2) On top of all this, we would really prefer to write the records
>> directly
>> to Aurora in RDS rather than to an intermediate s3 bucket, but it seems
>> that
>> the JDBC sink connector is unsupported / doesn't exist.
>> If this is not the case we would love to know.
>>
>> Thanks in advance for all the help / insight on this,
>>
>> Max Walker
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-maili
>> ng-list-archive.2336050.n4.nabble.com/Issues-with-Event-
>> Time-and-Kafka-tp12061.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12084.html
> To unsubscribe from Issues with Event Time and Kafka, click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=12061&code=ZXh0Lm13YWxrZXJAcmlvdGdhbWVzLmNvbXwxMjA2MXwyMDg2Mjg2MjU0>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12087.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Mime
View raw message