opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: SentenceDetectorME.train API change
Date Thu, 21 Jul 2011 15:04:13 GMT
2011/7/21 Jörn Kottmann <kottmann@gmail.com>

> On 7/21/11 4:13 PM, william.colen@gmail.com wrote:
>
>> Should I just change the parameters order?
>>
>> For some reason the API was using a Dictionary to represent the
>> abbreviation
>> dictionary, but it was never used in the default context generator.
>> Initially I was thinking about using this Dictionary implementation, but
>> according to DefaultSDContextGenerator an abbreviation dictionary should
>> implement Set<String>  and since Dictionary was already implementing
>> Iterable<StringList>  it can't also implement Set<String>.
>>
>> Another option should be to remove the new AbbreviationDictionary class
>> and
>> try to use Dictionary instead. Maybe adding a method "asStringSet()" that
>> creates a Set<String>  from the Dictionary and we can pass it to the
>> context
>> generator.
>>
>> What do you think?
>>
>
> The Dictionary is similar to the new Abbreviation Dictionary, but
> additionally supports
> storing entries which consist of multiple tokens.
>
> Do we have multi token abbreviations? If yes, we should use Dictionary.
> Otherwise we could still use it, then the tokenizer could have a small util
> method to turn a Dictionary into a Set<String>.
>
> Reusing the Dictionary makes a few things easier because we do not have
> to duplicate them.
>
> We can also change the DefaultSDContextGenerator, if that is more
> convenient.
>
>
I don't have enough knowledge of this part but I wonder if a little
refactoring consisting of creating an interface Dictionary and 2
implementing classes i.e. SimpleDictionary and AbbreviationsDictionary could
help.
My 0.0002 cents.
Tommaso

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message