flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Pisula <michael.pis...@tngtech.com>
Subject Re: YARN Reserved Memory
Date Wed, 07 Dec 2016 11:25:00 GMT
Hi Stefan,

thanks for the fast feedback. Updating to a newer YARN Version is most
certainly something that would benefit us in many different areas (the
issues with the HA mode being the most important of them), however at
the moment we are not able to update to a newer version. If that is
another of those cases where our outdated YARN version is cause for a
problem, that would at least give us more arguments to prioritize the
upgrade ;-)

Cheers,

Michael


On 07.12.2016 11:33, Stefan Richter wrote:
> Hi,
>
> did you observe the problem only under YARN 2.4.0? IIRC this version of YARN has some
problems that can also lead to issues with Flink’s HA mode, and I would encourage you to
upgrade YARN to 2.5 or higher. On a different note, there have been several improvements that
we will release in Flink 1.1.4, not entirely sure if this is a known problem covered by the
upcoming bugfix release. I will add Till to the discussion who worked a lot in this direction.
>
> Best,
> Stefan
>
>> Am 07.12.2016 um 09:19 schrieb Michael Pisula <michael.pisula@tngtech.com>:
>>
>> Hi Guys,
>>
>> We are having a slight issue using Flink 1.1.3 (we also observed the
>> problem with 1.0.2) in Yarn 2.4.0. Whenever a TaskManager restarts, YARN
>> seems to reserve memory during the TaskManager restart, and not free the
>> memory again. We are using a CapacityScheduler with 2 queues, where the
>> queue in which our Flink Yarn Session runs has a guaranteed capacity of
>> 75%. What we are seeing, is that the amount of reserved memory is
>> exactly the amount of memory available in the queue after the
>> TaskManager is crashed.
>>
>> On our test system, further TaskManager restarts have been able to get
>> rid of the TaskManager again. When trying to replicate this on our
>> production system I was not successful, one difference being, that I
>> killed a TaskManager with no used slots in prod, while on the test
>> system jobs were restarted.
>>
>> Nothing enlightening in the logs, unfortunately.
>>
>> Is this something that anyone has experienced so far?
>>
>> Cheers,
>>
>> Michael
>>
>>
>> -- 
>> Michael Pisula * michael.pisula@tngtech.com * +49-174-3180084
>> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
>> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
>> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>>
>>

-- 
Michael Pisula * michael.pisula@tngtech.com * +49-174-3180084
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



Mime
View raw message