singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-8) Implement distributed Hogwild
Date Thu, 25 Jun 2015 13:45:05 GMT

    [ https://issues.apache.org/jira/browse/SINGA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601210#comment-14601210
] 

ASF subversion and git services commented on SINGA-8:
-----------------------------------------------------

Commit 4956d6a031de16811e4585b9c28b9ab29c33ab76 in incubator-singa's branch refs/heads/master
from wang wei
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=4956d6a ]

SINGA-8 Implement distributed Hogwild

Fixbug from parameter synchronization among server groups.
Interprocs dealer cannot send messages to other process if the endpoint is hostname, e.g.,
"blob-pc".
Replaced hostname to host IP in binding/connecting endpoint. But the GetHostIP method is specific
to linux OS.
Another issue is the synchronization frequency. Currently, the stub will trigger one sync
reminder every time
its poller expires. If the expire time is large, then the reminder would seldomly be triggered.
If it is small,
many reminder messages will be trigger. TODO tune the trigger.


> Implement distributed Hogwild
> -----------------------------
>
>                 Key: SINGA-8
>                 URL: https://issues.apache.org/jira/browse/SINGA-8
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>            Assignee: wangwei
>              Labels: distributed, features, hogwild
>
> Generally, both the Downpour framework of Google Brain [1] and the Caffe's distributed
Hogwild implementation are extensions of the shared memory Hogwild training. In this ticket,
we refer to the second one.
> In specific, each server group masters a subset of parameters (i.e., Param objects) when
synchronizing with other server groups. It aggregates all updates for its subset and sends
back (e.g., broadcast) the updated parameters back to all other server groups. The synchronization
is conducted asynchronously. The frequency can be fixed in the first implementations. Finally,
it should be tuned automatically to fully utilize the network bandwidth.
> [1]J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato,
A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale
> distributed deep networks. In NIPS, pages 1232{1240, 2012.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message