From dev-return-5610-archive-asf-public=cust-asf.ponee.io@singa.apache.org Tue Jun 2 08:51:26 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id C39CA18064C for ; Tue, 2 Jun 2020 10:51:25 +0200 (CEST) Received: (qmail 58042 invoked by uid 500); 2 Jun 2020 08:51:25 -0000 Mailing-List: contact dev-help@singa.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@singa.apache.org Delivered-To: mailing list dev@singa.apache.org Received: (qmail 58031 invoked by uid 99); 2 Jun 2020 08:51:25 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jun 2020 08:51:25 +0000 From: =?utf-8?q?GitBox?= To: dev@singa.apache.org Subject: =?utf-8?q?=5BGitHub=5D_=5Bsinga=5D_chrishkchris_edited_a_comment_on_pull_req?= =?utf-8?q?uest_=23697=3A_New_Model_Layer_Operator_API?= Message-ID: <159108788500.10999.3262600003624056639.asfpy@gitbox.apache.org> Date: Tue, 02 Jun 2020 08:51:25 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit In-Reply-To: References: chrishkchris edited a comment on pull request #697: URL: https://github.com/apache/singa/pull/697#issuecomment-637375715 I am using this PR to train Xceptionnet in order to use the save_state function, but I encountered something strange: (i) The training and evaluation were both okay in https://github.com/apache/singa/pull/651 ``` (singa) dcsysh@panda7:~/singa/examples/autograd$ python3 train.py xceptionnet ci Starting Epoch 0: Training loss = 11198.645508, training accuracy = 0.214420 Evaluation accuracy = 0.309000, Elapsed Time = 606.547117s Starting Epoch 1: Training loss = 6354.611328, training accuracy = 0.381020 Evaluation accuracy = 0.457300, Elapsed Time = 612.817129s ``` (ii) This time I think the training is okay, but something wrong in the evaluation ``` root@e8a757397ca3:~/dcsysh/singa/examples/cnn# mpiexec -np 8 python3 train_mpi.py xceptionnet cifar10 --bs 16 --lr 0.04 --epoch 30 Starting Epoch 0: Training loss = 11614.897461, training accuracy = 0.131190 Evaluation accuracy = 0.099860, Elapsed Time = 98.705291s Starting Epoch 1: Training loss = 6932.552246, training accuracy = 0.157552 Evaluation accuracy = 0.099860, Elapsed Time = 98.400360s Starting Epoch 2: Training loss = 6565.343262, training accuracy = 0.195853 Evaluation accuracy = 0.099960, Elapsed Time = 99.807898s Starting Epoch 3: Training loss = 6173.305176, training accuracy = 0.254467 Evaluation accuracy = 0.099960, Elapsed Time = 99.759293s Starting Epoch 4: Training loss = 5841.223633, training accuracy = 0.306430 Evaluation accuracy = 0.099960, Elapsed Time = 99.962356s ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org