flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Wysakowicz <wysakowicz.da...@gmail.com>
Subject Re: Queries regarding FlinkCEP
Date Mon, 05 Jun 2017 06:46:08 GMT
I think Till answered all your question but just to rephrase a bit.

1. The within and TimeCharacteristic are working on different levels. The
TimeCharacteristics tells how events are assigned a timestamp. The within
operator specifies the maximal time between first and last event of a
matched sequence (the time here corresponds to the chosen
TimeCharacteristic). So if we have within(Time.minutes(10)) in EventTime,
upon Watermark arrival the events are sorted with the assigned Timestamp
and then the within is applied.

3. Looking at your code there is nothing wrong with it. As I don't know how
the timestamps of your events looks like, I can just guess, but I would say
either

   - there is no matching sequences of events in your stream that fit into
   10 minutes window
   - or, your events are more mixed than across 60 seconds. Consider
   example: we have events with timestamps {t1=600s, t2=620, t3=550s}. Event
   with t3=550s cannot match with t1 because it lags 70s > 60s behind t2.
   FlinkCEP right now drops all late events.

For deeper understanding of Event/Processing Time I would suggest having a
look at :
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_time.html#event-time


Z pozdrowieniami! / Cheers!

Dawid Wysakowicz

*Data/Software Engineer*

Skype: dawid_wys | Twitter: @OneMoreCoder

<http://getindata.com/>

2017-06-02 18:22 GMT+02:00 Till Rohrmann <trohrmann@apache.org>:

> Hi Biplob,
>
> 1. The CEPPatternOperator can use either processing time or event time for
> its internal processing logic. It only depends on what TimeCharacteristic
> you have set for your program. Consequently, with event time, your example
> should be detected as an alert.
>
> 2. If you don't provide a keyed input stream, then Flink will execute the
> CEP operator only with a parallelism of 1. Thus, all events pass through
> the same instance of the CEP operator.
>
> 3. It's hard to tell but I would assume that something with the watermark
> generation does not properly work. For example, it could be that you've set
> the out of orderness to a very large value such that it will take a long
> time until you can be sure that you've seen all events for a given
> watermark on the input without monotonically increasing timestamps. The
> easiest way to debug the problem would be a self-contained example program
> which reproduces the problem.
>
> Cheers,
> Till
>
> On Fri, Jun 2, 2017 at 1:10 PM, Biplob Biswas <revolutionisme@gmail.com>
> wrote:
>
>> Hi ,
>>
>> Thanks a lot for the help last time, I have a few more questions and I
>> chose
>> to create a new topic as the problem in the previous topic was solved,
>> thanks to useful inputs from Flink Community. The questions are as follows
>>
>> *1.* What time does the "within" operator works on "Event Time" or
>> "Processing Time", I am asking this as I wanted to know whether something
>> like the following would be captured or not.
>>
>> MaxOutofOrderness is set to 10 mins, and "within" operator is specified
>> for
>> 5 mins. So if a first events event time is at 1:00  and the corresponding
>> next event is has an event time of 1:04 but it arrives in the system at
>> 1:06. Would this still be processed and alert would be generated or not?
>>
>> *2.* What would happen if I don't have a key to specify, the way 2 events
>> are correlated is by using the ctx of the first event and matching some
>> different id. So, we can't group by some unique field. I tried a test run
>> without specifying a key and it apparently works. But how is the shuffling
>> done then in this case?
>>
>> *3.* This is one of the major issue, So I could use Event Time with
>> ascending event time extractor for one of my kafka topic because its
>> behavior is consistent.  But when i added another topic to read from where
>> the events are not in ascending order, using ascending timestampextractor
>> gave me timestamp monotonicity violation. Then when I am using
>> BoundedOutOfOrdernessTimestampExtractor for the same, I am not getting
>> any
>> warnings anymore but I am no more getting my alerts.
>>
>> If I go back to using processing time, then I am again getting alerts
>> properly. What could be the problem here?
>>
>> *This is the code I am using:*
>>
>> /public class CEPForBAM {
>>
>>
>>   public static void main(String[] args) throws Exception {
>>
>>     StreamExecutionEnvironment env =
>> StreamExecutionEnvironment.getExecutionEnvironment();
>>     System.out.println(env.getStreamTimeCharacteristic());
>>     env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
>>     env.getConfig().setAutoWatermarkInterval(10000);
>>
>> // configure Kafka consumer
>>     Properties props = new Properties();
>>     props = getDefaultProperties(props);
>>
>>     FlinkKafkaConsumer010<BAMEvent> kafkaSource = new
>> FlinkKafkaConsumer010<>(
>>             Arrays.asList("topic1", "topic_x", "topic_test"),
>>             new StringSerializerToEvent(),
>>             props);
>>
>>     kafkaSource.assignTimestampsAndWatermarks(new
>> BoundedOutOfOrdernessTimestampExtractor<BAMEvent>(Time.seconds(60)) {
>>
>>       private static final long serialVersionUID = -7228487240278428374L;
>>
>>       @Override
>>       public long extractTimestamp(BAMEvent event) {
>>         return event.getTimestamp();
>>       }
>>     });
>>
>>     DataStream<BAMEvent> events = env.addSource(kafkaSource);
>>
>>     // Input stream of monitoring events
>>
>>
>> /*    DataStream<BAMEvent> partitionedInput = events
>>             .keyBy((KeySelector<BAMEvent, String>) BAMEvent::getId);*/
>>
>>      evetns.print();
>>     //partitionedInput.print();
>>
>>     Pattern<BAMEvent, ?> pattern = Pattern.<BAMEvent>begin("first")
>>             .where(new SimpleCondition<BAMEvent>() {
>>               private static final long serialVersionUID =
>> 1390448281048961616L;
>>
>>               @Override
>>               public boolean filter(BAMEvent event) throws Exception {
>>                 return
>> event.getEventName().equals(ReadEventType.class.getSimpleName());
>>               }
>>             })
>>             .followedBy("second")
>>             .where(new IterativeCondition<BAMEvent>() {
>>               private static final long serialVersionUID =
>> -9216505110246259082L;
>>
>>               @Override
>>               public boolean filter(BAMEvent secondEvent,
>> Context<BAMEvent>
>> ctx) throws Exception {
>>
>>                 if
>> (secondEvent.getEventName().equals(StatusChangedEventType.cl
>> ass.getSimpleName()))
>> {
>>                   for (BAMEvent firstEvent :
>> ctx.getEventsForPattern("first")) {
>>                     if
>> (secondEvent.getCorrelationID().contains(firstEvent.getEventId()))
>>                       return true;
>>                   }
>>                 }
>>                 return false;
>>               }
>>             })
>>             .within(Time.minutes(10));
>>
>>     PatternStream<BAMEvent> patternStream = CEP.pattern(events, pattern);
>>
>>
>>     DataStream<Either&lt;String, String>> alerts =
>> patternStream.select(new
>> PatternTimeoutFunction<BAMEvent, String>() {
>>       private static final long serialVersionUID = -8717561187522704500L;
>>
>>       @Override
>>       public String timeout(Map<String, List&lt;BAMEvent>> map, long
l)
>> throws Exception {
>>         return "TimedOut: " + map.toString() + " @ " + l;
>>       }
>>
>>     }, new PatternSelectFunction<BAMEvent, String>() {
>>       private static final long serialVersionUID = 3144439966791408980L;
>>
>>       @Override
>>       public String select(Map<String, List&lt;BAMEvent>> pattern) throws
>> Exception {
>>         BAMEvent bamEvent = pattern.get("first").get(0);
>>         return "Matched Events: " + bamEvent.getEventId() + "_" +
>> bamEvent.getEventName();
>>       }
>>     });
>>
>>     alerts.print();
>>
>>     env.execute("CEP monitoring job");
>>   }
>> }/
>>
>>
>> Even when I am using Event Time, I am getting events from kafka as can be
>> shown from event.print()
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-maili
>> ng-list-archive.2336050.n4.nabble.com/Queries-regarding-
>> FlinkCEP-tp13454.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>

Mime
View raw message