singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] [singa] joddiy edited a comment on issue #696: Refactor autograd module
Date Thu, 14 May 2020 10:41:53 GMT

joddiy edited a comment on issue #696:

   > Shall we go with the following APIs?
   > @joddiy @dcslin @XJDKC
   > They should be compatible with the current APIs.
   > ```python
   > class Module:
   >     def compile(self, inputs, is_train, use_graph, graph_alg):
   >         set train, graph etc config
   >         turn off graph
   >         if inputs are not filled, print warnings and fill inputs according to data
   >         self.forward(*inputs)
   >      def load(self, ckp_path, include_state=False):
   >        load onnx model and copy the params to each layer; 
   >        generate warnings for mismatched layers/params.
   >        restore the states and return it as a dict
   >      def save(self, ckp_path, state={}):
   >        save the model as onnx format
   >        save the states
   >      def forward(self, x):    # turn on graph if necessary
   >         pass
   >      def train_one_batch(self, x, y):  # turn on graph if necessary
   >         pass   
   >      @deprecated 
   >      def loss(self, ):
   >         pass
   >       @deprecated 
   >       def optim(self,):
   >           pass      
   > class Layer:
   >     def __init__(name=None):
   >       self.init = False
   >     def __call__(self, x):
   >        if self.init == False:
   >            init layer states
   >        else:
   >           # do the forward propagation 
   > class MyLayer(Layer):
   >      def __init__(self):
   >           self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, padding=0,
   >           self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
   >       def forward(self, x):
   >           return self.layer2(self.layer1(x))
   > class MyModule(Module):
   >      def __init__(self):
   >            self.blk1 = MyLayer()
   >            self.blk2 = MyLayer()
   >            self.optim = SGD()
   >            self.loss = CrossEntropyLoss()
   >       def forward(self, x):
   >            return self.blk2(self.blk1(x))    
   >       def train_one_batch(self, x, y): 
   >            y_ = self.forward(x)
   >            l = self.loss(y_, y)
   >            self.optim.backward_and_update(l)
   >            return l
   > x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
   > fill x with values
   > m = MyModel()
   > # compatible with existing code which does not have the following two statements.
   > m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
   > for pname, ptensor in m.get_params():
   >     ptensor.uniform(-1, 1)   # not necessary if each layer's param init methods are
   > y = Placeholder((2,), device = gpu)
   > for npx, npy in data:
   >    x.copy_from(npx)
   >    y.copy_from(npy)
   >    m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code,
the params are initialized here.
   >'mymodel', state={'epoch': data.size(), 'sgd': m.optim}
   > ```
   This approach still postpones the operation init till the training phase right? When the
user has a batch of samples, he calls `train_one_batch`, to call `forward`, and then to call
   def __call__(self, x):
       if self.init == False:
           init layer states
   it's still strange to init the graph until the user has the data.
   In my opinion, the current problem is, 
   1. we don't have the shape of the input -> so we using a Placeholder as the input
   2. even we have the shape of input data, we cannot compute the all shapes of intermediate
tensors since we cannot call the forward with Placeholder -> we may want to init random
data but it may incur error.
   So, the key point is, we bind the graph construction with `forward` function. Only when
we call forward, we construct the graph. But if we want to call forward we must have the real
   Then I'm thinking about separating the graph construction with `forward` function. We define
several classes called `Graph`, `Node`, the `Graph` stores relationship between `Node`s, and
`Node`s stores an `Operation` as well as its input and output.  
   In the `_call_` function of an `Operation`, we don't call the `forward` function, instead,
create a `Node`, and stores this operation itself within this `Node`, set its input and output,
then return this newly created `Node`. So finally, in the following code:
   class Operation(object):
       def __init__(self):
       def __call__(self, previous_node): # for multiply input is similiar
           # create an Node
           # link the current with previous node
           # do the infer_shape, set the shape of each input and output for the current node
and previous node
           current_node = new Node()
           current_node.input.node = previous_node
           current_node.operation = self
           current_node.output.shape = infer_shape()
           previous_node.output.node = current_node
           return current_node
       def forward():
       def backward():
       def infer_shape():
   We actually constructed a `Graph` linked with `Node` by using the following code:
   class MyModule(Module):
       def __init__(self):
           super(Model, self).__init__()
           self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
           self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
           self.sgd = opt.SGD(lr=0.01)
       def construt_graph(self, x):
           # x is a placeholder
          # create the Graph linked with Node
           y = self.conv1(x)
           y = self.conv2(y)
           self.graph = Graph(x, y)
       def train(self, x, y): 
           y_ = self.graph.forward(x)
           l = self.loss(y_, y)
           return l
       def loss(self, out, y):
           return autograd.softmax_cross_entropy(out, y)
       def optim(self, loss):
   model = MyModule()
   x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
   model.construt_graph(x) # build the graph
   y = Placeholder((2,), device = gpu)
   for npx, npy in data:
      m.train(x, y)  # directly train'mymodel', state={'epoch': data.size(), 'sgd': m.optim}

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

View raw message