From dev-return-5807-archive-asf-public=cust-asf.ponee.io@singa.apache.org  Tue Jun 16 11:12:47 2020
Return-Path: <dev-return-5807-archive-asf-public=cust-asf.ponee.io@singa.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 3B10F180621
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 16 Jun 2020 13:12:47 +0200 (CEST)
Received: (qmail 48560 invoked by uid 500); 16 Jun 2020 11:12:46 -0000
Mailing-List: contact dev-help@singa.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@singa.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@singa.apache.org>
List-Post: <mailto:dev@singa.apache.org>
List-Id: <dev.singa.apache.org>
Reply-To: dev@singa.apache.org
Delivered-To: mailing list dev@singa.apache.org
Received: (qmail 48546 invoked by uid 99); 16 Jun 2020 11:12:46 -0000
Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jun 2020 11:12:46 +0000
From: =?utf-8?q?GitBox?= <git@apache.org>
To: dev@singa.apache.org
Subject: =?utf-8?q?=5BGitHub=5D_=5Bsinga=5D_chrishkchris_edited_a_comment_on_pull_req?=
 =?utf-8?q?uest_=23730=3A_Support_training_RNN_with_computation_graph?=
Message-ID: <159230596651.8807.11902962792350100766.asfpy@gitbox.apache.org>
Date: Tue, 16 Jun 2020 11:12:46 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
In-Reply-To: <singa.730.MDExOlB1bGxSZXF1ZXN0NDMzNzgxNDAw.gitbox@gitbox.apache.org>
References: <singa.730.MDExOlB1bGxSZXF1ZXN0NDMzNzgxNDAw.gitbox@gitbox.apache.org>


chrishkchris edited a comment on pull request #730:
URL: https://github.com/apache/singa/pull/730#issuecomment-644694297


   > This pr is almost done. Now the constructed graph is correct and can be executed normally.
   > There is only one issue. After training some iterations, the loss value will become NaN. This phenomenon will appear no matter whether the graph is enabled or not. Any ideas about this problem? @dcslin @chrishkchris
   > 
   > ```shell
   > root@ip-172-31-6-19:/home/ubuntu/Program/singa/examples/qabot git:(lstm-graph*) # python train.py -g
   > successfully loaded word2vec model and corpus
   > successfully generated train, eval, test data
   > epoch 0, time used 7 sec, top1 hits: 0.000000, loss:  [6.2321043]
   > epoch 1, time used 7 sec, top1 hits: 0.000000, loss:  [6.1910734]
   > epoch 2, time used 7 sec, top1 hits: 0.010000, loss:  [6.1914635]
   > epoch 3, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > epoch 4, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > epoch 5, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > epoch 6, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > epoch 7, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > epoch 8, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > epoch 9, time used 7 sec, top1 hits: 1.000000, loss:  [nan]
   > training top1 hits rate:  1.0
   > ```
   
   @XJDKC 
   
   The LSTM seems still have problem, @dcslin is checking the LSTM with a simpler binary semantic analysis example at https://github.com/apache/singa/pull/733.
   
   For example, maybe can check the following:
   
   From the website, https://docs.nvidia.com/deeplearning/sdk/cudnn-api/index.html#cudnnRNNForwardTraining
   it says  "strides in xDesc should be set as follows: `strideA[0]=inputSize, strideA[1]=1, strideA[2]=1`"
   
   While in our code, https://github.com/apache/singa/blob/dev/src/model/operation/rnn.cc#L174
   ```
     strideA[0] = dimA[2] * dimA[1];
     strideA[1] = dimA[2];
     strideA[2] = 1;
   ```
   Says if we feed the data wrong, we can still memorize the data pattern but cannot actually learn anything. 
   
   When you finish all other tasks, you may also help to check the rnn.cc also :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org