singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-47) Fix a bug in data layers that leads to out-of-memory when group size is too large
Date Wed, 12 Aug 2015 09:49:46 GMT

    [ https://issues.apache.org/jira/browse/SINGA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693236#comment-14693236
] 

ASF subversion and git services commented on SINGA-47:
------------------------------------------------------

Commit 7a61a687c2ceb4fc7e05c2d3bbd9817e8ba59e3f in incubator-singa's branch refs/heads/master
from Wei Wang
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=7a61a68 ]

SINGA-47 Fix a bug in data layers that leads to out-of-memory when group size is too large

The bug is fixed by closing the data source (e.g., lmdb or datashard) after reading a sample
record in the Setup function.
The data source would cacahe memory which eat up all memory if there are many data layers.


> Fix a bug in data layers that leads to out-of-memory when group size is too large 
> ----------------------------------------------------------------------------------
>
>                 Key: SINGA-47
>                 URL: https://issues.apache.org/jira/browse/SINGA-47
>             Project: Singa
>          Issue Type: Bug
>            Reporter: wangwei
>
> The Setup function of a data layer opens the database (e.g., DataShard or LMDB) and reads
a sample record. The sample record is necessary for setting upper layers' data shape. Every
data layer's Setup function is called when SINGA creates the NeuralNet object. If there the
group size is 128 and partitioning is on dimension 0, then 128 data layers will be created.
The memory would be used up if the database object has large cache (prefetch) size.
> Although every process has the full NeuralNet object, i.e., all layers. Each process
has a subset of workers which run over a subset of (data) layers. Consequently, in one process,
only a small number of data layers will call ComputeFeature to read data records.
> To fix the bug, we just close the database after reading one sample record in Setup function,
and re-open it in ComputeFeature function. In this way, only a smaller number of database
instances are open in each process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message