opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: abbreviation diccionary format
Date Thu, 19 Apr 2012 17:11:44 GMT
On 04/19/2012 06:20 PM, Joan Codina wrote:
>
>
> then with the sentences with all tokens separated by spaces y need to 
> merge the words adding <space> but I don't know how to make it with 
> the  dictionaryDetokenizer
> ./opennlp DictionaryDetokenizer ../models/en-detokenizer.xml 
> <../models/CoNLL2009-ST-English-train.sent
>
> as it merges the senteces but does not add the <space> 

It should insert <SPLIT> tags for certain spaces, so the tokenizer can learn
that there is something to split. Input should be one sentence per line.

What output do you get?

Jörn

Mime
View raw message