nudles commented on pull request #722:
URL: https://github.com/apache/singa/pull/722#issuecomment640144833
>
>
> hi @nudles, sorry for the delay, there are still issues regarding the demo model,
updates are:
> 0. added gensim as word2vec converter. it is not cleared how the pooling part of the
model design is done in the paper. But from the reference model, it contracts the lstm output
tensor by mean on sequence axis, which could be done by `autograd.reduce_mean()`
>
> 1. as loss function `L = max{0, M − cosine(q, a+) + cosine(q, a−)}` required
two forward passes, then one backward pass, which is not supported by singa. Tried to concate
the a+ and a into {bs2, seq, embed} tensor and make model accept input like `(q, a+, a)`.
then in testing phase it is confusing because there is no label for answer.
during test, we only need to compute cosine(q, a) . there is no cosine(q, a+) or cosine(q,
a).
>
> 2. tried to implemented a simplified version that subsituting loss function `L
= max{0, M − cosine(q, a+) + cosine(q, a−)}`, with mseloss, then the model could be trained
with date in the format of `<q, a+, 1>, <q, a, 0>`. but there is convergence
problem.
>
> 3. advised by @joddiy , we could train the model with data format: one batch has
two samples `<q,a+>` and `<q,a>` ordered alternatively, then we modified the
loss function compute the loss for every batch of 2 samples(batch_index 0: `pos_sim`, batch_index
1:`neg_sim`), still checking.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
