singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ngin Yun Chuan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-406) [Rafiki] Add POS tagging task & add GPU support (0.0.7)
Date Mon, 19 Nov 2018 02:40:00 GMT

    [ https://issues.apache.org/jira/browse/SINGA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16691196#comment-16691196
] 

Ngin Yun Chuan commented on SINGA-406:
--------------------------------------

The ``nvidia/cuda:9.0-runtime-ubuntu16.04`` seems to run workers correctly on my mac machine
without GPU, and in combination with setting ``CUDA_VISIBLE_DEVICES`` dynamically during worker
deployment, we can stay with 1 worker image that works on both CPU-only machines and machines
with GPU. Would there be any problems with this setup?

If we have another worker image for CPU-only e.g. `rafiki_worker_cpu`, does it mean that model
developers need to extend from *both* worker Docker images to support model training on both
CPU and GPU, if they want to provide their custom Docker image? Or should we drop this configurable
option?

If we let app developers configure the Docker container at runtime, does it mean that they
will now have to know about the models that would be trained on their dataset and understand
the dependencies of each model (model developers might need document)? If they are allowed
to provide any Docker container, they must extend Rafiki's worker image, build the image themselves,
and submit to DockerHub, and they must account for the dependencies of each model during training.
Feel like doing it this way makes it complex for the app developer?

> [Rafiki] Add POS tagging task & add GPU support (0.0.7)
> -------------------------------------------------------
>
>                 Key: SINGA-406
>                 URL: https://issues.apache.org/jira/browse/SINGA-406
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: Ngin Yun Chuan
>            Priority: Major
>
> Refer to https://github.com/nginyc/rafiki/pull/71 for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message