mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Rukletsov <a...@mesosphere.com>
Subject Re: Not getting resource offers for 20 min
Date Sun, 23 Aug 2015 19:51:40 GMT
This behaviour can be caused by
https://issues.apache.org/jira/browse/MESOS-3202.

On Thu, Aug 20, 2015 at 10:41 AM, Iulian Dragoș <iulian.dragos@typesafe.com>
wrote:

> Ok, I found it as well... I was confused about Max Share since it didn't
> tell me what resource was the dominant one... and since I don't know what's
> the "fair share" against which it is compared, it's hard to know if it was
> DRF or not... However, the good news is that once we upgraded Spark to 1.5
> preview and added `spark.mesos.role` (the other frameworks had a role),
> things started working as expected. Not sure if it's the role alone, or
> something in Spark 1.5 helped with this issue.
>
> thanks for your help!
> iulian
>
>
> On Wed, Aug 19, 2015 at 5:25 PM, Hans van den Bogert <hansbogert@gmail.com
> > wrote:
>
>> That’s quite strange, I’m running 0.21.0. But it’s called Max Share
>>
>> See attached image
>>
>>
>> On 19 Aug 2015, at 16:41, Iulian Dragoș <iulian.dragos@typesafe.com>
>> wrote:
>>
>> Thanks for your reply.
>>
>> I checked the UI, but I'm not sure where to find this info. I can see the
>> number of CPUs or memory, but nothing about it's dominant resource or
>> share... (Mesos 0.21.1). I guess I can compute the share by looking at the
>> total cores and memory, though :)
>>
>> It'd be great if there was a more direct way to understand how the
>> allocator works, and why it skips a certain framework. I checked the
>> sources but there doesn't seem to be any logging in that part of the code...
>>
>> iulian
>>
>> On Wed, Aug 19, 2015 at 2:34 PM, Hans van den Bogert <
>> hansbogert@gmail.com> wrote:
>>
>>> Have you inspected the framework page/tab in the Mesos master web UI?
>>> Perhaps, as you already suspect, DRF is only handing out resources to
>>> frameworks which have a lower dominant resource. So you could check if your
>>> spark instance has a high dominant resource due to the executors taking up
>>> a lot of memory.
>>>
>>> I’m having alike problems in a, albeit contrived, environment where
>>> there are 4 long running spark instances, where the first  instance (only
>>> first by a small time value) gets offered all mesos-slaves and runs the
>>> executor. The next instances have a lower chance of getting the same amount
>>> of memory, but as their dominant resource is lower (memory) they more often
>>> get CPU resources compared to that first instance. Counter intuitively, the
>>> first instance finishes last.
>>>
>>> On 19 Aug 2015, at 14:07, Iulian Dragoș <iulian.dragos@typesafe.com>
>>> wrote:
>>>
>>> I am facing a problem with a framework not getting any resource offers
>>> for 15-20 minutes, while other frameworks (8-9 of them) continuously get
>>> offers.
>>>
>>> The framework is Spark (running in fine-grained mode), and is launched
>>> with Chronos. After a few tasks successfully executed, it stops getting
>>> offers, though looking at the master logs we see other frameworks getting
>>> offers every few seconds. For some reason, the Spark one isn't getting them
>>> for a very long period of time.
>>>
>>> Can such behaviour be explained by the DRF algorithm? How could I debug
>>> this?
>>>
>>> thanks,
>>> iulian
>>>
>>> --
>>>
>>> --
>>> Iulian Dragos
>>>
>>> ------
>>> Reactive Apps on the JVM
>>> www.typesafe.com
>>> er
>>>
>>>
>>>
>>
>>
>> --
>>
>> --
>> Iulian Dragos
>>
>> ------
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>>
>>
>
>
> --
>
> --
> Iulian Dragos
>
> ------
> Reactive Apps on the JVM
> www.typesafe.com
>
>

Mime
View raw message