flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Performance Issue
Date Wed, 09 Sep 2015 07:08:23 GMT
Ok, that's a special case but the system still shouldn't behave that way.
The problem is that the grouped discretizer that is responsible for
grouping the elements into grouped windows is keeping state for every key
that it encounters. And that state is never released, ever. That's the
reason for the hight memory consumption and GC load.

On Wed, 9 Sep 2015 at 07:01 Rico Bergmann <info@ricobergmann.de> wrote:

> Yes. The keys are constantly changing. Indeed each unique event has its
> own key (the event itself). The purpose was to do an event deduplication ...
>
>
>
> Am 08.09.2015 um 20:05 schrieb Aljoscha Krettek <aljoscha@apache.org>:
>
> Hi Rico,
> I have a suspicion. What is the distribution of your keys? That is, are
> there many unique keys, do the keys keep evolving, i.e. is it always new
> and different keys?
>
> Cheers,
> Aljoscha
>
> On Tue, 8 Sep 2015 at 13:44 Rico Bergmann <info@ricobergmann.de> wrote:
>
>> I also see in the TM overview the CPU load is still around 25% although
>> there is no input to the program since minutes. The CPU load is degrading
>> very slowly.
>>
>> The memory consumption is still fluctuating at a high level. It does not
>> degrade.
>>
>> In my test I generated test input for 1 minute. Now 10 minutes are over
>> ...
>>
>> I think there must be something with flink...
>>
>>
>>
>> Am 08.09.2015 um 13:32 schrieb Rico Bergmann <info@ricobergmann.de>:
>>
>> The marksweep value is very high, the scavenge very low. If this helps ;-)
>>
>>
>>
>>
>> Am 08.09.2015 um 11:27 schrieb Robert Metzger <rmetzger@apache.org>:
>>
>> It is in the "Information" column: http://i.imgur.com/rzxxURR.png
>> In the screenshot, the two GCs only spend 84 and 25 ms.
>>
>> On Tue, Sep 8, 2015 at 10:34 AM, Rico Bergmann <info@ricobergmann.de>
>> wrote:
>>
>>> Where can I find these information? I can see the memory usage and cpu
>>> load. But where are the information on the GC?
>>>
>>>
>>>
>>> Am 08.09.2015 um 09:34 schrieb Robert Metzger <rmetzger@apache.org>:
>>>
>>> The webinterface of Flink has a tab for the TaskManagers. There, you can
>>> also see how much time the JVM spend with garbage collection.
>>> Can you check whether the number of GC calls + the time spend goes up
>>> after 30 minutes?
>>>
>>> On Tue, Sep 8, 2015 at 8:37 AM, Rico Bergmann <info@ricobergmann.de>
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> I also think it's a GC problem. In the KeySelector I don't instantiate
>>>> any object. It's a simple toString method call.
>>>> In the mapWindow I create new objects. But I'm doing the same in other
>>>> map operators, too. They don't slow down the execution. Only with this
>>>> construct the execution is slowed down.
>>>>
>>>> I watched on the memory footprint of my program. Once with the code
>>>> construct I wrote and once without. The memory characteristic were the
>>>> same. The CPU usage also ...
>>>>
>>>> I don't have an explanation. But I don't think it comes from my
>>>> operator functions ...
>>>>
>>>> Cheers Rico.
>>>>
>>>>
>>>>
>>>> Am 07.09.2015 um 22:43 schrieb Martin Neumann <mneumann@sics.se>:
>>>>
>>>> Hej,
>>>>
>>>> This sounds like it could be a garbage collection problem. Do you
>>>> instantiate any classes inside any of the operators (e.g. in the
>>>> KeySelector). You can also try to run it locally and use something like
>>>> jstat to rule this out.
>>>>
>>>> cheers Martin
>>>>
>>>> On Mon, Sep 7, 2015 at 12:00 PM, Rico Bergmann <info@ricobergmann.de>
>>>> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> While working with grouping and windowing I encountered a strange
>>>>> behavior. I'm doing:
>>>>>
>>>>> dataStream.groupBy(KeySelector).window(Time.of(x,
>>>>> TimeUnit.SECONDS)).mapWindow(toString).flatten()
>>>>>
>>>>>
>>>>> When I run the program containing this snippet it initially outputs
>>>>> data at a rate around 150 events per sec. (That is roughly the input
rate
>>>>> for the program). After about 10-30 minutes the rate drops down below
5
>>>>> events per sec. This leads to event delivery offsets getting bigger and
>>>>> bigger ...
>>>>>
>>>>> Any explanation for this? I know you are reworking the streaming API.
>>>>> But it would be useful to know, why this happens ...
>>>>>
>>>>> Cheers. Rico.
>>>>>
>>>>
>>>>
>>>
>>

Mime
View raw message