singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SINGA-82) Refactor input layers using data store abstraction
Date Wed, 07 Oct 2015 07:26:27 GMT

    [ https://issues.apache.org/jira/browse/SINGA-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946428#comment-14946428
] 

ASF subversion and git services commented on SINGA-82:
------------------------------------------------------

Commit 5f010caabd7c09cd9fabee666d93a36377639270 in incubator-singa's branch refs/heads/master
from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=5f010ca ]

SINGA-82 Refactor input layers using data store abstraction

* Add StoreLayer to read data from Store, e.g., KVFile, TextFile (will add support for HDFS
later).
* Implemente subclasses of StoreLayer to parse different format tuples, e.g., SingleLabelImageRecord
or CSV line.
* Update examples to use the new input layers.
* Add unit tests.
* Add a function for Layer class, which returns a vector<AuxType> for auxiliary data
(e.g., label).

TODO
1. make AuxType a template argument of Layer class, and extend data() to return a vector of
Blob for multiple dense features.
2. separate layer classeses into different files to make the structure of the source folder
clear.


> Refactor input layers using data store abstraction
> --------------------------------------------------
>
>                 Key: SINGA-82
>                 URL: https://issues.apache.org/jira/browse/SINGA-82
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>            Assignee: wangwei
>
> 1. Separate the data storage from Layer. Currently, SINGA creates one layer to read data
from one storage, e.g., ShardData, CSV, LMDB. One problem is that only read operations are
provided. When users prepare the training data, they have to get familiar with the read/write
operations for each storage. Inspired from caffe::db::DB, we can provide a storage  abstraction
with simple read/write operation interfaces. Then users call these operations to prepare their
training data. Particularly, training data is stored as (string key, string value) tuples.
The base Store class 
> {code}
> // open the store for reading, writing or appending
> virtual bool Open(const string& source, Mode mode);
> // for reading tuples
> virtual bool Read(string*key, string*value) = 0;
> // for writing tuples
> virtual bool Write(const string& key, const string& value) = 0;
> {code}
> The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be supported soon),
inherits Store and overrides the functions. 
> Consequently, a single KVInputLayer (like the SequenceFile.Reader from Hadoop) can read
from different sources by configuring *store* field (e.g., store=csv). 
> With the Store class, we can implement a KVInputLayer to read batchsize tuples in its
ComputeFeature function. The tuple is parsed by a virtual function depending on the application
(or the format of the tuple). 
> {code}
> // parse the tuple as the k-th instance for one mini-batch
> virtual bool Parse(int k, const string& key, const string& tuple) = 0;
> {code}
> For example, a CSVKVInputLayer may parse the key into a line ID, and parse the label
and feature from the value field. An ImageKVInputLayer may parse a SingleLabelImageRecord
from the value field.
> 2. The will be a set of layers doing data preprocessing, e.g., normalization and image
augmentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message