singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangwei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SINGA-210) Enable checkpoint and resume for v1.0
Date Tue, 28 Jun 2016 04:26:57 GMT

     [ https://issues.apache.org/jira/browse/SINGA-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

wangwei updated SINGA-210:
--------------------------
    Summary: Enable checkpoint and resume for v1.0  (was: Eanble checkpoint and resume for
v1.0)

> Enable checkpoint and resume for v1.0
> -------------------------------------
>
>                 Key: SINGA-210
>                 URL: https://issues.apache.org/jira/browse/SINGA-210
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>
> This ticket is going to add code for dumping the model parameters as checkpoint files,
which could be used for fine-tuning and deployment.
> The model parameters should be separated from model definition, i.e., net construction.
Users either random initialize the layer parameters or using the parameters from checkpoint
files after creating the neural net. In other words, we do not add a pair of serializing and
parsing functions in the Layer class.
> We need to decide the format of the checkpoint file and how to write and read it:
> 1. the checkpoint file consists of the model parameters, which could be serialized as
key-value pairs, where the key is the parameter name and value is a protobuf object including
the shape and values. Optionally, there could be a text file including the parameter meta
info, e..g, name and shape, which would be useful for users to know the model parameters without
parsing the binary checkpoint file.
> 2. the binary checkpoint file can be serialized using the Writer SINGA-202 and loaded
into memory using the Reader (SINGA-202).
> 3. A checkpoint utility class should be implemented for 1 and 2. Compatibility with caffe
checkpoint files may also be considered to re-use models from caffe model zoo http://caffe.berkeleyvision.org/model_zoo.html.
> {code}
> class Checkpoint {
>   // <prefix>.model is the binary file for parameter key-value pair;   
>   // <prefix>.meta is the text file, one line per parameter. 
>   Checkpoint(prefix, mode=[R|W]);  
>   Read();  // read .model
>   ReadMeta() ; // read meta only
>   Get(key);  // return the value protobuf obj.
>   GetMeta(key);
>   Read(key);
>   Write(key, value);  // write to both .model and .meta files.
> };
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message