singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wang Wei <wang...@comp.nus.edu.sg>
Subject Re: is it possible to use MPI HPC cluster in singa?
Date Mon, 18 Jan 2016 09:18:44 GMT
Hi Li Li,

MPI is included in our schedule for v0.3
https://issues.apache.org/jira/browse/SINGA-133.
Please stay tuned, or join us to implement this ticket.

Zookeeper is important for distributed training, as it keeps running data
for failure recovery.

Best,
Wei



On Mon, Jan 18, 2016 at 4:56 PM, Li Li <fancyerii@gmail.com> wrote:

> I have access to a HPC cluster with hundreds of nodes with thousands
> cores. I found there are many deep learning framework can train using
> multiple gpu in a single node. very few frameworks can train in
> multiple nodes using gpu. Is there any framework can train with MPI? I
> can't install long run service like zookeeper or spark in the cluster.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message