singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-57) Improve Distributed Hogwild
Date Tue, 15 Sep 2015 10:07:45 GMT

    [ https://issues.apache.org/jira/browse/SINGA-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745178#comment-14745178
] 

ASF subversion and git services commented on SINGA-57:
------------------------------------------------------

Commit ed9e37369c69dd76078e8285bc33d6b04ba60e9f in incubator-singa's branch refs/heads/master
from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=ed9e373 ]

SINGA-57 Improve Distributed Hogwild

The ClusterProto::sync_freq field controls the frequency of sync between
server groups.
After updating of Param (slice), the server checks the num of updates
since last sync. It also checks the num of pending syncs (i.e., requests
haven't received reponses) to avoid sending too many msgs to stopped
servers (the msgs would be occupy the memory of the sending buffer)
The server respones to every sync requests with the latest Param values.

Note: current does not support (there is bug) multiple worker groups in
one process for the distributed hogwild framework. We recommend to
replace this cluster topology with in-memory hogwild, i.e., launching
one worker group with multiple workers and one server group.


> Improve Distributed Hogwild
> ---------------------------
>
>                 Key: SINGA-57
>                 URL: https://issues.apache.org/jira/browse/SINGA-57
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>
> The implementation SINGA-8 of distributed Hogwild uses the stub thread to monitor the
network bandwidth. When the network has >0 bandwidth, the stub sends a sync reminder msg
to a server, which would trigger the server to sync one param slice with other server groups.
> The code is messy due to the monitoring of network bandwidth and processing the sync
reminder message. Another problem is that the  reminder message may not be generated frequently.
Because it is generated only when the router times out. If the worker and server run very
fast that the router rarely times out, then the sync reminder message cannot be sent. In contrast,
if the router times out frequently, many reminder messages would be generated.
> This ticket improves the implementation by fixing the frequency of synchronization between
server groups. A server sends a sync message every sync_freq updates, for the parameter slice
it masters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message