flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Counting tuples within a window in Flink Stream
Date Fri, 26 Feb 2016 18:42:10 GMT
Yes, Gyula, that should work. I would make the random key across a range of
10 * parallelism.




On Fri, Feb 26, 2016 at 7:16 PM, Gyula Fóra <gyula.fora@gmail.com> wrote:

> Hey,
>
> I am wondering if the following code will result in identical but more
> efficient (parallel):
>
> input.keyBy(assignRandomKey).window(Time.seconds(10)
> ).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count)
>
> Effectively just assigning random keys to do the preaggregation and then
> do a window on the pre-aggregated values. I wonder if this actually leads
> to correct results or how does it interplay with the time semantics.
>
> Cheers,
> Gyula
>
> Stephan Ewen <sewen@apache.org> ezt írta (időpont: 2016. febr. 26., P,
> 19:10):
>
>> True, at this point it does not pre-aggregate in parallel, that is
>> actually a feature on the list but not yet added...
>>
>> On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <saiph.kappa@gmail.com>
>> wrote:
>>
>>> That code will not run in parallel right? So, a map-reduce task would
>>> yield better performance no?
>>>
>>>
>>>
>>> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> Then go for:
>>>>
>>>> input.timeWindowAll(Time.seconds(10)).fold(0, new
>>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override
public
>>>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception
>>>> { return integer + 1; } });
>>>>
>>>> Try to explore the API a bit, most things should be quite intuitive.
>>>> There are also some docs:
>>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams
>>>>
>>>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <saiph.kappa@gmail.com>
>>>> wrote:
>>>>
>>>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want
>>>>> to count all tuples that are contained in a window.
>>>>>
>>>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <trohrmann@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Saiph,
>>>>>>
>>>>>> you can do it the following way:
>>>>>>
>>>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction<Tuple2<Integer,
Integer>, Integer>() {
>>>>>>     @Override
>>>>>>     public Integer fold(Integer integer, Tuple2<Integer, Integer>
o) throws Exception {
>>>>>>         return integer + 1;
>>>>>>     }
>>>>>> });
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>> ​
>>>>>>
>>>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <saiph.kappa@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> In Flink Stream what's the best way of counting the number of
tuples
>>>>>>> within a window of 10 seconds? Using a map-reduce task? Asking
because in
>>>>>>> spark there is the method rawStream.countByWindow(Seconds(x)).
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message