flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rico Bergmann <i...@ricobergmann.de>
Subject Re: Performance Issue
Date Thu, 24 Sep 2015 14:51:06 GMT
The test data is generated in a flink program running in a separate jvm. The generated data
is then written to a Kafka topic from which my programs reads the data ...



> Am 24.09.2015 um 14:53 schrieb Aljoscha Krettek <aljoscha@apache.org>:
> 
> Hi Rico,
> are you generating the data directly in your flink program or some external queue, such
as Kafka?
> 
> Cheers,
> Aljoscha
> 
>> On Thu, 24 Sep 2015 at 13:47 Rico Bergmann <info@ricobergmann.de> wrote:
>> And as side note:
>> 
>> The problem with duplicates seems also to be solved!
>> 
>> Cheers Rico. 
>> 
>> 
>> 
>>> Am 24.09.2015 um 12:21 schrieb Rico Bergmann <info@ricobergmann.de>:
>>> 
>>> I took a first glance. 
>>> 
>>> I ran 2 test setups. One with a limited test data generator, the outputs around
200 events per second. In this setting the new implementation keeps up with the incoming message
rate. 
>>> 
>>> The other setup had an unlimited generation (at highest possible rate). There
the same problem as before can be observed. After 2 minutes runtime the output of my program
is more than a minute behind ... And increasing over time. But I don't know whether this could
be a setup problem. I noticed the os load of my testsystem was around 90%. So it might be
more a setup problem ...
>>> 
>>> Thanks for your support so far. 
>>> 
>>> Cheers. Rico. 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> Am 24.09.2015 um 09:33 schrieb Aljoscha Krettek <aljoscha@apache.org>:
>>>> 
>>>> Hi Rico,
>>>> you should be able to get it with these steps:
>>>> 
>>>> git clone https://github.com/StephanEwen/incubator-flink.git flink
>>>> cd flink
>>>> git checkout -t origin/windows
>>>> 
>>>> This will get you on Stephan's windowing branch. Then you can do a
>>>> 
>>>> mvn clean install -DskipTests
>>>> 
>>>> to build it.
>>>> 
>>>> I will merge his stuff later today, then you should also be able to use it
by running the 0.10-SNAPSHOT version.
>>>> 
>>>> Cheers,
>>>> Aljoscha
>>>> 
>>>> 
>>>>> On Thu, 24 Sep 2015 at 09:11 Rico Bergmann <info@ricobergmann.de>
wrote:
>>>>> Hi!
>>>>> 
>>>>> Sounds great. How can I get the source code before it's merged to the
master branch? Unfortunately I only have 2 days left for trying this out ...
>>>>> 
>>>>> Greets. Rico. 
>>>>> 
>>>>> 
>>>>> 
>>>>>> Am 24.09.2015 um 00:57 schrieb Stephan Ewen <sewen@apache.org>:
>>>>>> 
>>>>>> Hi Rico!
>>>>>> 
>>>>>> We have finished the first part of the Window API reworks. You can
find the code here: https://github.com/apache/flink/pull/1175
>>>>>> 
>>>>>> It should fix the issues and offer vastly improved performance (up
to 50x faster). For now, it supports time windows, but we will support the other cases in
the next days.
>>>>>> 
>>>>>> I'll ping you once it is merged, I'd be curious if it fixes your
issue. Sorry that you ran into this problem...
>>>>>> 
>>>>>> Greetings,
>>>>>> Stephan
>>>>>> 
>>>>>> 
>>>>>>> On Mon, Sep 7, 2015 at 12:00 PM, Rico Bergmann <info@ricobergmann.de>
wrote:
>>>>>>> Hi!
>>>>>>> 
>>>>>>> While working with grouping and windowing I encountered a strange
behavior. I'm doing:
>>>>>>>> dataStream.groupBy(KeySelector).window(Time.of(x, TimeUnit.SECONDS)).mapWindow(toString).flatten()
>>>>>>> 
>>>>>>> When I run the program containing this snippet it initially outputs
data at a rate around 150 events per sec. (That is roughly the input rate for the program).
After about 10-30 minutes the rate drops down below 5 events per sec. This leads to event
delivery offsets getting bigger and bigger ... 
>>>>>>> 
>>>>>>> Any explanation for this? I know you are reworking the streaming
API. But it would be useful to know, why this happens ...
>>>>>>> 
>>>>>>> Cheers. Rico. 

Mime
View raw message