flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3109) Join two streams with two different buffer time
Date Wed, 20 Jan 2016 21:05:39 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109407#comment-15109407

ASF GitHub Bot commented on FLINK-3109:

Github user wangyangjun commented on a diff in the pull request:

    --- Diff: flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/CoGroupJoinITCase.java
    @@ -234,6 +234,109 @@ public void invoke(String value) throws Exception {
     		Assert.assertEquals(expectedResult, testResults);
    +	// TODO: design buffer join test
    --- End diff --
    @tillrohrmann Do you know why the checks have failed? There are 5 build jobs, only 3 of
them passed. This is my first time to commit to an open source project. I have no idea how
my code affects the failed tests.

> Join two streams with two different buffer time
> -----------------------------------------------
>                 Key: FLINK-3109
>                 URL: https://issues.apache.org/jira/browse/FLINK-3109
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.10.1
>            Reporter: Wang Yangjun
>              Labels: easyfix, patch
>             Fix For: 0.10.2
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> Current Flink streaming only supports join two streams on the same window. How to solve
this problem?
> For example, there are two streams. One is advertisements showed to users. The tuple
in which could be described as (id, showed timestamp). The other one is click stream -- (id,
clicked timestamp). We want get a joined stream, which includes all the advertisement that
is clicked by user in 20 minutes after showed.
> It is possible that after an advertisement is shown, some user click it immediately.
It is possible that "click" message arrives server earlier than "show" message because of
Internet delay. We assume that the maximum delay is one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream and another
buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
>             .where(keySelector)
>             .buffer(Time.of(20, TimeUnit.MINUTES))
>             .equalTo(keySelector)
>             .buffer(Time.of(1, TimeUnit.MINUTES))
>             .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149

This message was sent by Atlassian JIRA

View raw message