singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangwei (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SINGA-48) Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff groups
Date Wed, 19 Aug 2015 06:13:46 GMT

     [ https://issues.apache.org/jira/browse/SINGA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

wangwei resolved SINGA-48.
--------------------------
    Resolution: Fixed
      Assignee: wangwei

> Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers from diff
groups
> --------------------------------------------------------------------------------------------
>
>                 Key: SINGA-48
>                 URL: https://issues.apache.org/jira/browse/SINGA-48
>             Project: Singa
>          Issue Type: Bug
>            Reporter: wangwei
>            Assignee: wangwei
>
> In SINGA, workers from the same group and in the same process share the same NeuralNet
instance. Different worker groups should have different NeuralNet objects However, the current
Trainer::SetupWorkerServer function assigns the same NeuralNet instance to workers in different
groups. Consequently, two workers may compute for the same layer instance which would lead
to repeated calling of ComputeFeature and ComputeGradient functions, and case run-time errors.
> Another issue is that if two workers from different groups but resident in the same process,
they would share memory for layer parameters. The memory sharing has no problem if the group
size is 1. But if there are more than 1 workers in a group, they should run synchronously.
The synchronization is controlled by parameter version. If memory sharing is enabled, workers
from other groups may increase the parameter version that leads to errors in synchronization.
To fix this issue, SINGA needs to disable memory sharing among groups if worker group size
>1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message