mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulian Dragoș <iulian.dra...@typesafe.com>
Subject Re: Not getting resource offers for 20 min
Date Mon, 24 Aug 2015 10:33:42 GMT
On Sun, Aug 23, 2015 at 10:27 PM, Sam Bessalah <samkiller.oss@gmail.com>
wrote:

> Hi Lulian
> Do you get the same problem when launching without Chronos.
> I had this issue happening a lot when scheduling jobs with chronos.
>

Unfortunately I don't have access to the cluster anymore, but I think
Chronos wasn't the culprit. After updating Spark to 1.5 and setting a
framework role offers started to come (while still using Chronos).

iulian


>
>
> On Sun, Aug 23, 2015 at 9:51 PM, Alex Rukletsov <alex@mesosphere.com>
> wrote:
>
>> This behaviour can be caused by
>> https://issues.apache.org/jira/browse/MESOS-3202.
>>
>> On Thu, Aug 20, 2015 at 10:41 AM, Iulian Dragoș <
>> iulian.dragos@typesafe.com> wrote:
>>
>>> Ok, I found it as well... I was confused about Max Share since it didn't
>>> tell me what resource was the dominant one... and since I don't know what's
>>> the "fair share" against which it is compared, it's hard to know if it was
>>> DRF or not... However, the good news is that once we upgraded Spark to 1.5
>>> preview and added `spark.mesos.role` (the other frameworks had a role),
>>> things started working as expected. Not sure if it's the role alone, or
>>> something in Spark 1.5 helped with this issue.
>>>
>>> thanks for your help!
>>> iulian
>>>
>>>
>>> On Wed, Aug 19, 2015 at 5:25 PM, Hans van den Bogert <
>>> hansbogert@gmail.com> wrote:
>>>
>>>> That’s quite strange, I’m running 0.21.0. But it’s called Max Share
>>>>
>>>> See attached image
>>>>
>>>>
>>>> On 19 Aug 2015, at 16:41, Iulian Dragoș <iulian.dragos@typesafe.com>
>>>> wrote:
>>>>
>>>> Thanks for your reply.
>>>>
>>>> I checked the UI, but I'm not sure where to find this info. I can see
>>>> the number of CPUs or memory, but nothing about it's dominant resource or
>>>> share... (Mesos 0.21.1). I guess I can compute the share by looking at the
>>>> total cores and memory, though :)
>>>>
>>>> It'd be great if there was a more direct way to understand how the
>>>> allocator works, and why it skips a certain framework. I checked the
>>>> sources but there doesn't seem to be any logging in that part of the code...
>>>>
>>>> iulian
>>>>
>>>> On Wed, Aug 19, 2015 at 2:34 PM, Hans van den Bogert <
>>>> hansbogert@gmail.com> wrote:
>>>>
>>>>> Have you inspected the framework page/tab in the Mesos master web UI?
>>>>> Perhaps, as you already suspect, DRF is only handing out resources to
>>>>> frameworks which have a lower dominant resource. So you could check if
your
>>>>> spark instance has a high dominant resource due to the executors taking
up
>>>>> a lot of memory.
>>>>>
>>>>> I’m having alike problems in a, albeit contrived, environment where
>>>>> there are 4 long running spark instances, where the first  instance (only
>>>>> first by a small time value) gets offered all mesos-slaves and runs the
>>>>> executor. The next instances have a lower chance of getting the same
amount
>>>>> of memory, but as their dominant resource is lower (memory) they more
often
>>>>> get CPU resources compared to that first instance. Counter intuitively,
the
>>>>> first instance finishes last.
>>>>>
>>>>> On 19 Aug 2015, at 14:07, Iulian Dragoș <iulian.dragos@typesafe.com>
>>>>> wrote:
>>>>>
>>>>> I am facing a problem with a framework not getting any resource offers
>>>>> for 15-20 minutes, while other frameworks (8-9 of them) continuously
get
>>>>> offers.
>>>>>
>>>>> The framework is Spark (running in fine-grained mode), and is launched
>>>>> with Chronos. After a few tasks successfully executed, it stops getting
>>>>> offers, though looking at the master logs we see other frameworks getting
>>>>> offers every few seconds. For some reason, the Spark one isn't getting
them
>>>>> for a very long period of time.
>>>>>
>>>>> Can such behaviour be explained by the DRF algorithm? How could I
>>>>> debug this?
>>>>>
>>>>> thanks,
>>>>> iulian
>>>>>
>>>>> --
>>>>>
>>>>> --
>>>>> Iulian Dragos
>>>>>
>>>>> ------
>>>>> Reactive Apps on the JVM
>>>>> www.typesafe.com
>>>>> er
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> --
>>>> Iulian Dragos
>>>>
>>>> ------
>>>> Reactive Apps on the JVM
>>>> www.typesafe.com
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> --
>>> Iulian Dragos
>>>
>>> ------
>>> Reactive Apps on the JVM
>>> www.typesafe.com
>>>
>>>
>>
>


-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Mime
View raw message