opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <>
Subject Re: abbreviation diccionary format
Date Thu, 19 Apr 2012 17:11:44 GMT
On 04/19/2012 06:20 PM, Joan Codina wrote:
> then with the sentences with all tokens separated by spaces y need to 
> merge the words adding <space> but I don't know how to make it with 
> the  dictionaryDetokenizer
> ./opennlp DictionaryDetokenizer ../models/en-detokenizer.xml 
> <../models/CoNLL2009-ST-English-train.sent
> as it merges the senteces but does not add the <space> 

It should insert <SPLIT> tags for certain spaces, so the tokenizer can learn
that there is something to split. Input should be one sentence per line.

What output do you get?


View raw message