opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Document Categorizer based on Glove + LSTM (powered by DL4J)
Date Wed, 05 Jul 2017 07:19:00 GMT
thanks Thamme for bringing this to the list!


Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda <tgowdan@gmail.com> ha
scritto:

> Hello OpenNLP Devs,
>
> I am working with text classification using word embeddings like
> Gloves/Word2Vec and LSTM networks.
> It will be interesting to see if we can use it as document categorizer,
> especially for sentiment analysis in OpenNLP.
>
> I have already raised a PR to the sandbox repo -
> https://github.com/apache/opennlp-sandbox/pull/3
>
> This is first version, and I expect to receive feedback from Dev community
> to make it work for everyone.
>
> Here are the design choices I have made for the initial version:
>
>    - Using pre-trained Gloves - I felt the glove vector format is clean,
>    easily customizable in terms of dimensions and vocabulary size, and
> (also I
>    have been reading a lot about them from Stanford NLP group).
>       - Training Gloves isnt hard either, we can do it using the original C
>       library as well as by using DL4J.
>       - Using DL4J's Multi layer networks with LSTM instead of reinventing
>    this stuff again on JVM for OpenNLP
>
>
> Please share your feedback here or on the github page
> https://github.com/apache/opennlp-sandbox/pull/3 .
>
>
I think the approach outlined here sounds good, I think we could
incorporate the PR as soon as it implements the Doccat API.
Then we may see whether and how it makes sense to adjust it to use other
types of embeddings (e.g. paragraph vectors) and / or different network
setups (e.g. more hidden layers, bidirectionalLSTM, etc.).

Looking forward to see this move forward,
Regards,
Tommaso


>
> Thanks,
> TG
>
>
> --
> *Thamme Gowda *
> @thammegowda <https://twitter.com/thammegowda> |
> http://scf.usc.edu/~tnarayan/
> ~Sent via somebody's Webmail server
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message