singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangwei (JIRA)" <>
Subject [jira] [Resolved] (SINGA-3) Use Zookeeper to check stopping (finish) time of the system
Date Thu, 28 May 2015 02:56:17 GMT


wangwei resolved SINGA-3.
    Resolution: Fixed

> Use Zookeeper to check stopping (finish) time of the system
> -----------------------------------------------------------
>                 Key: SINGA-3
>                 URL:
>             Project: Singa
>          Issue Type: New Feature
>         Environment: Linux, gcc>4.8
>            Reporter: wangwei
> To stop each process (node), we need to stop both its local workers and servers. For
worker threads, they will exit when they finish all training steps. For server threads, they
can exit only when all connected workers have stopped. 
> We use Zookeeper to detect the worker state. In specific, the main thread of each process
registers all local servers firstly to the Zookeeper. Then it registers each worker to a dedicated
server group, where its parameters are maintained. When one worker finishes execution, it
de-register from the server group (folder) in the Zookeeper and tells the main thread about
its state. When all workers registered in one server group finish, the callback function registered
for server group will send a stop message to him. The server tells the main thread about its
state and stops upon receiving this message. Once all local workers and local servers finish,
the main thread exit.

This message was sent by Atlassian JIRA

View raw message