mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Palamuttam <>
Subject Re: Mesos fine-grained multi-user mode failed to allocate tasks
Date Fri, 15 Jul 2016 03:50:28 GMT

We'll definitely take a look at cook. 
Right now we're observing in both fine grained and coarse grained jobs take quite a bit of
time to even be staged by mesos.

We're sitting there waiting on the interpreter/shell for quite a few minutes.

> On Jul 14, 2016, at 7:49 PM, David Greenberg <> wrote:
> By true multitenancy, I mean preemption, so that if a new user connects to the cluster,
their capacity is actually reclaimed and reallocated in minutes or seconds instead of hours.

>> On Wed, Jul 13, 2016 at 7:11 PM Rahul Palamuttam <> wrote:
>> Thanks David.
>> We will definitely take a look at Cook.
>> I am curious by what you mean by true multi-tenancy.
>> Under coarse-grained mode with dynamic allocation enabled - what I see in the mesos
UI is that there are 3 tasks running by default (one on each of the nodes nodes we have).
>> I also see the coarsegrainedexecutors being brought up.
>> *Another point is that I always see a spark-submit command being launched even if
I kill that command it comes back up and the exectors get reallocated on the worker nodes.
>> However, I am able to launch multiple spark shells and have jobs run concurrently
- which we were very happy with.
>> Unfortunately, I don't understand why mesos only shows 3 tasks running. I even see
the spike in thread count when launching my jobs, but the task count remains unchanged.
>> The mesos logs does show jobs coming in.
>> The three tasks just sit there in the webui - running.
>> Is this what is expected?
>> Does running coarsegrained with dynamic allocation make mesos look at each running
executor as a different task?
>>> On Wed, Jul 13, 2016 at 4:34 PM, David Greenberg <>
>>> You could also check out Cook from twosigma. It's open source on github, and
offers true preemptive multitenancy with spark on Mesos, by intermediating the spark drivers
to optimize the cluster overall. 
>>>> On Wed, Jul 13, 2016 at 3:41 PM Rahul Palamuttam <>
>>>> Thank you Joseph.
>>>> We'll try to explore coarse grained mode with dynamic allocation. 
>>>>> On Wed, Jul 13, 2016 at 12:28 PM, Joseph Wu <>
>>>>> Looks like you're running Spark in "fine-grained" mode (deprecated).
>>>>> (The Spark website appears to be down right now, so here's the doc on
>>>>>> Note that while Spark tasks in fine-grained will relinquish cores
as they terminate, they will not relinquish memory, as the JVM does not give memory back to
the Operating System. Neither will executors terminate when they're idle.
>>>>> You can follow some of the recommendations Spark has in that document
for sharing resources, when using Mesos. 
>>>>>> On Wed, Jul 13, 2016 at 12:12 PM, Rahul Palamuttam <>
>>>>>> Hi,
>>>>>> Our team has been tackling multi-tenancy related issues with Mesos
for quite some time.
>>>>>> The problem is that tasks aren't being allocated properly when multiple
applications are trying to launch a job. If we launch application A, and soon after application
B, application B waits pretty much till the completion of application A for tasks to even
be staged in Mesos. Right now these applications are the spark-shell or the zeppelin interpreter.

>>>>>> Even a simple sc.parallelize(1 to 10000000).reduce(+) launched in
two different spark-shells results in the issue we're observing. One of the counts waits (in
fact we don't even see the tasks being staged in mesos) until the current one finishes. This
is the biggest issue we have been experience and any help or advice would be greatly appreciated.
We want to be able to launch multiple jobs concurrently on our cluster and share resources
>>>>>> Another issue we see is that the java heap-space on the mesos executor
backend process is not being cleaned up once a job has finished in the spark shell. 
>>>>>> I've attached a png file of the jvisualvm output showing that the
heapspace is still allocated on a worker node. If I force the GC from jvisualvm then nearly
all of that memory gets cleaned up. This may be because the spark-shell is still active -
but if we've waited long enough why doesn't GC just clean up the space? However, even after
forcing GC the mesos UI shows us that these resources are still being used.
>>>>>> There should be a way to bring down the memory utilization of the
executors once a task is finished. It shouldn't continue to have that memory allocated, even
if a spark-shell is active on the driver.
>>>>>> We have mesos configured to use fine-grained mode. 
>>>>>> The following are parameters we have set in our spark-defaults.conf
>>>>>> spark.eventLog.enabled           true
>>>>>> spark.eventLog.dir               hdfs://frontend-system:8090/directory
>>>>>> spark.local.dir                    /data/cluster-local/SPARK_TMP
>>>>>> spark.executor.memory            50g
>>>>>> spark.externalBlockStore.baseDir /data/cluster-local/SPARK_TMP
>>>>>> spark.executor.extraJavaOptions  -XX:MaxTenuringThreshold=0 
>>>>>> spark.executor.uri      hdfs://frontend-system:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
>>>>>> spark.mesos.coarse      false
>>>>>> Please let me know if there are any questions about our configuration.
>>>>>> Any advice or experience the mesos community can share pertaining
to issues with fine-grained mode would be greatly appreciated!
>>>>>> I would also like to sincerely apologize for my previous test message
on the mailing list.
>>>>>> It was an ill-conceived idea since we are in a bit of a time crunch
and I needed to get this message posted. I forgot I needed to send reply on to the user-subscribers
email for me to be listed, resulting in message not sent emails. I will not do that again.

>>>>>> Thanks,
>>>>>> Rahul Palamuttam

View raw message