spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Utkarsh Maheshwari (JIRA)" <>
Subject [jira] [Commented] (SPARK-25678) SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)
Date Tue, 23 Oct 2018 04:35:00 GMT


Utkarsh Maheshwari commented on SPARK-25678:

{quote}Would recommend to look at: [] as this
seems to be related to your approach towards making Spark enable pluggable scheduler implementations
Thanks for replying. I did go through that before. The work done towards that is what made
my integration possible. But my idea is more about _adding PBS_ as another cluster manager
in Spark _using that work_ rather than changing Spark to fully support pluggable cluster managers.

> SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)
> ------------------------------------------------------------------------
>                 Key: SPARK-25678
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler
>    Affects Versions: 3.0.0
>            Reporter: Utkarsh Maheshwari
>            Priority: Major
> I sent an email on the dev mailing list but got no response, hence filing a JIRA ticket.
> PBS (Portable Batch System) Professional is an open sourced workload management system
for HPC clusters. Many organizations using PBS for managing their cluster also use Spark for
Big Data but they are forced to divide the cluster into Spark cluster and PBS cluster either
physically dividing the cluster nodes into two groups or starting Spark Standalone cluster
manager's Master and Slaves as PBS jobs, leading to underutilization of resources.
>  I am trying to add support in Spark to use PBS as a pluggable cluster manager. Going
through the Spark codebase and looking at Mesos and Kubernetes integration, I found that we
can get this working as follows:
>  - Extend `ExternalClusterManager`.
>  - Extend `CoarseGrainedSchedulerBackend`
>    - This class can start `Executors` as PBS jobs.
>    - The initial number of `Executors` are started `onStart`.
>    - More `Executors` can be started as and when required using `doRequestTotalExecutors`.
>    - `Executors` can be killed using `doKillExecutors`.
>  - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy mode.
>    - This extended class can submit the Spark application again as a PBS job which with
deploy mode = client, so that the application driver is started on a node in the cluster.
>  I have a couple of questions:
>  - Does this seem like a good idea to do this or should we look at other options?
>  - What are the expectations from the initial prototype?
>  - If this works, would Spark maintainers look forward to merging this or would they
want it to be maintained as a fork?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message