singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [singa] dcslin commented on pull request #722: cudnn lstm
Date Sat, 06 Jun 2020 18:43:15 GMT

dcslin commented on pull request #722:
URL: https://github.com/apache/singa/pull/722#issuecomment-640101950


   hi @nudles, sorry for the delay, there are still issues regarding the demo model, updates
are:
   0. added gensim as word2vec converter. it is not cleared how the pooling part of the model
design is done in the paper. But from the reference model, it contracts the lstm output tensor
by mean on sequence axis, which could be done by `autograd.reduce_mean()`
   1. as loss function `L = max{0, M − cosine(q, a+) + cosine(q, a−)}` required two forward
passes, then one backward pass, which is not supported by singa. Tried to concate the a+ and
a- into {bs2, seq, embed} tensor and make model accept input like `(q, a+, a-)`. then in testing
phase it is confusing because there is no label for answer.
   2. tried to implemented a simplified version that subsituting loss function `L = max{0,
M − cosine(q, a+) + cosine(q, a−)}`, with mseloss, then the model could be  trained with
date in the format of `<q, a+, 1>, <q, a-, 0>`. but there is convergence problem.
   3. advised by @joddiy , we could train the model with data format: one batch has two samples
`<q,a+>` and `<q,a->` ordered alternatively, then we modified the loss function
compute the loss for every batch of 2 samples(batch_index 0: `pos_sim`, batch_index 1:`neg_sim`),
still checking.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message