mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chester kuo (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-2262) Adding GPGPU resource into Mesos framework, so we can know if any GPGPU resource are available for master
Date Fri, 30 Jan 2015 12:17:34 GMT

     [ https://issues.apache.org/jira/browse/MESOS-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

chester kuo updated MESOS-2262:
-------------------------------
    Component/s: slave
                 framework
    Description: 
Extending Mesos to support Heterogeneous resource such as GPGPU/FPGA..etc as computing resources
in the data-center, OpenCL will be first target to add into Mesos (support by all major GPU
vendor) , I will reserve to support others such as CUDA in the future.

In this feature, slave will be supported to do resources discover including but not limited
to, 
(1) Heterogeneous Computing protocol type : "OpenCL". "CUDA", "HSA"
(2) Computing global memory (MB)
(3) Computing run time version , such as "1.2" , "2.0"
(4) Computing compute unit (double)
(5) Computing device type : GPGPU, CPU, Accelerator device.
(6) Computing (number of devices): (double)


The Heterogeneous resource isolation will be supported in the framework instead of in the
slave devices side, the major reason here is , the ecosystem , such as OpenCL operate on top
of private device driver own by vendors, only runtime library (OpenCL) is user-space application,
so its hard for us to do like Linux cgroup to have CPU/memory resource isolation. As a result
we may use run time library to do device isolation and memory allocation.
(PS, if anyone know how to do it for GPGPU driver, please drop me a note)

Meanwhile, some run-time library (such as OpenCL) support to run on top of CPU, so we need
to use isolator API to notify this once it allocated.




  was:
Try to add and extend to support OpenCL/GPU resource into Mesos so can we run OpenCL application
across Mesos cluster and utilize system's GPU resource include memory , cpu..etc.

Why choose OpenCL instead of CUDA ?? Since OpenCL are supported by couple of vendors include
AMD, Intel , Samsung, Nvidia, Qualcomm..etc , and CUDA only supported by Nvidia only.



    Environment: OpenCL support env, such as OS X, Linux, Windows..
     Issue Type: Task  (was: Improvement)
        Summary: Adding GPGPU resource into Mesos framework, so we can know if any GPGPU resource
are available for master  (was: Add GPU resource into Mesos framework, so we can know if any
OpenCL/GPU resource are available for task running.)

There is one discussion i like to have people input, in Mesos resource protocol buffer message,
it already support bunch of resources (such as CPU, memory , disk) , should we continue to
use this resource message or defined a new resources message for Heterogeneous devices to
avoid having very long length of resource message used currently.

Any comments , suggestions ?

 

> Adding GPGPU resource into Mesos framework, so we can know if any GPGPU resource are
available for master
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-2262
>                 URL: https://issues.apache.org/jira/browse/MESOS-2262
>             Project: Mesos
>          Issue Type: Task
>          Components: framework, slave
>         Environment: OpenCL support env, such as OS X, Linux, Windows..
>            Reporter: chester kuo
>            Priority: Minor
>
> Extending Mesos to support Heterogeneous resource such as GPGPU/FPGA..etc as computing
resources in the data-center, OpenCL will be first target to add into Mesos (support by all
major GPU vendor) , I will reserve to support others such as CUDA in the future.
> In this feature, slave will be supported to do resources discover including but not limited
to, 
> (1) Heterogeneous Computing protocol type : "OpenCL". "CUDA", "HSA"
> (2) Computing global memory (MB)
> (3) Computing run time version , such as "1.2" , "2.0"
> (4) Computing compute unit (double)
> (5) Computing device type : GPGPU, CPU, Accelerator device.
> (6) Computing (number of devices): (double)
> The Heterogeneous resource isolation will be supported in the framework instead of in
the slave devices side, the major reason here is , the ecosystem , such as OpenCL operate
on top of private device driver own by vendors, only runtime library (OpenCL) is user-space
application, so its hard for us to do like Linux cgroup to have CPU/memory resource isolation.
As a result we may use run time library to do device isolation and memory allocation.
> (PS, if anyone know how to do it for GPGPU driver, please drop me a note)
> Meanwhile, some run-time library (such as OpenCL) support to run on top of CPU, so we
need to use isolator API to notify this once it allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message