opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj B. Narayanan" <>
Subject Features to tokenizer
Date Tue, 26 Sep 2017 04:05:13 GMT

I was wondering if there is an possibility to provide features to
tokenizer. Sometimes, tokenization might depend on certain factors.

For example, the word 'semi-supervised' shouldn't be tokenized while
'august-september' should be tokenized.

Is there any way by which we could add custom features to the Learnable
Tokenizer similar to NER.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message