opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: SentenceDetectorME.train API change
Date Thu, 21 Jul 2011 15:04:48 GMT
On Thu, Jul 21, 2011 at 11:38 AM, Jörn Kottmann <> wrote:

> On 7/21/11 4:13 PM, wrote:
>> Should I just change the parameters order?
>> For some reason the API was using a Dictionary to represent the
>> abbreviation
>> dictionary, but it was never used in the default context generator.
>> Initially I was thinking about using this Dictionary implementation, but
>> according to DefaultSDContextGenerator an abbreviation dictionary should
>> implement Set<String>  and since Dictionary was already implementing
>> Iterable<StringList>  it can't also implement Set<String>.
>> Another option should be to remove the new AbbreviationDictionary class
>> and
>> try to use Dictionary instead. Maybe adding a method "asStringSet()" that
>> creates a Set<String>  from the Dictionary and we can pass it to the
>> context
>> generator.
>> What do you think?
> The Dictionary is similar to the new Abbreviation Dictionary, but
> additionally supports
> storing entries which consist of multiple tokens.
> Do we have multi token abbreviations? If yes, we should use Dictionary.
> Otherwise we could still use it, then the tokenizer could have a small util
> method to turn a Dictionary into a Set<String>.
> Reusing the Dictionary makes a few things easier because we do not have
> to duplicate them.
> We can also change the DefaultSDContextGenerator, if that is more
> convenient.

OK, I'll revert my changes and do it using Dictionary.
I never had to do it in SVN, can you help me point me how to do it? I
searched the web how to do it but I'm not feeling confident. I'm using
I need to revert the changes related to the issues 225 and 234.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message