From dev-return-5703-archive-asf-public=cust-asf.ponee.io@singa.apache.org  Sun Jun  7 02:04:50 2020
Return-Path: <dev-return-5703-archive-asf-public=cust-asf.ponee.io@singa.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id C5CB8180608
	for <archive-asf-public@cust-asf.ponee.io>; Sun,  7 Jun 2020 04:04:49 +0200 (CEST)
Received: (qmail 65602 invoked by uid 500); 7 Jun 2020 02:04:49 -0000
Mailing-List: contact dev-help@singa.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@singa.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@singa.apache.org>
List-Post: <mailto:dev@singa.apache.org>
List-Id: <dev.singa.apache.org>
Reply-To: dev@singa.apache.org
Delivered-To: mailing list dev@singa.apache.org
Received: (qmail 65588 invoked by uid 99); 7 Jun 2020 02:04:48 -0000
Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 Jun 2020 02:04:48 +0000
From: =?utf-8?q?GitBox?= <git@apache.org>
To: dev@singa.apache.org
Subject: =?utf-8?q?=5BGitHub=5D_=5Bsinga=5D_nudles_commented_on_pull_request_=23722?=
 =?utf-8?q?=3A_cudnn_lstm?=
Message-ID: <159149548883.17834.15453864129919736298.asfpy@gitbox.apache.org>
Date: Sun, 07 Jun 2020 02:04:48 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
References: <singa.722.MDExOlB1bGxSZXF1ZXN0NDI4MzQ1Njgy.gitbox@gitbox.apache.org>
In-Reply-To: <singa.722.MDExOlB1bGxSZXF1ZXN0NDI4MzQ1Njgy.gitbox@gitbox.apache.org>


nudles commented on pull request #722:
URL: https://github.com/apache/singa/pull/722#issuecomment-640144833


   > 
   > 
   > hi @nudles, sorry for the delay, there are still issues regarding the demo model, updates are:
   > 0. added gensim as word2vec converter. it is not cleared how the pooling part of the model design is done in the paper. But from the reference model, it contracts the lstm output tensor by mean on sequence axis, which could be done by `autograd.reduce_mean()`
   > 
   >     1. as loss function `L = max{0, M − cosine(q, a+) + cosine(q, a−)}` required two forward passes, then one backward pass, which is not supported by singa. Tried to concate the a+ and a- into {bs2, seq, embed} tensor and make model accept input like `(q, a+, a-)`. then in testing phase it is confusing because there is no label for answer.
   
   during test, we only need to compute cosine(q, a) . there is no cosine(q, a+) or cosine(q, a-).
   > 
   >     2. tried to implemented a simplified version that subsituting loss function `L = max{0, M − cosine(q, a+) + cosine(q, a−)}`, with mseloss, then the model could be  trained with date in the format of `<q, a+, 1>, <q, a-, 0>`. but there is convergence problem.
   > 
   >     3. advised by @joddiy , we could train the model with data format: one batch has two samples `<q,a+>` and `<q,a->` ordered alternatively, then we modified the loss function compute the loss for every batch of 2 samples(batch_index 0: `pos_sim`, batch_index 1:`neg_sim`), still checking.
   
   
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org