opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "william.colen@gmail.com" <william.co...@gmail.com>
Subject Re: SentenceDetectorME.train API change
Date Thu, 21 Jul 2011 15:04:48 GMT
On Thu, Jul 21, 2011 at 11:38 AM, Jörn Kottmann <kottmann@gmail.com> wrote:

> On 7/21/11 4:13 PM, william.colen@gmail.com wrote:
>
>> Should I just change the parameters order?
>>
>> For some reason the API was using a Dictionary to represent the
>> abbreviation
>> dictionary, but it was never used in the default context generator.
>> Initially I was thinking about using this Dictionary implementation, but
>> according to DefaultSDContextGenerator an abbreviation dictionary should
>> implement Set<String>  and since Dictionary was already implementing
>> Iterable<StringList>  it can't also implement Set<String>.
>>
>> Another option should be to remove the new AbbreviationDictionary class
>> and
>> try to use Dictionary instead. Maybe adding a method "asStringSet()" that
>> creates a Set<String>  from the Dictionary and we can pass it to the
>> context
>> generator.
>>
>> What do you think?
>>
>
> The Dictionary is similar to the new Abbreviation Dictionary, but
> additionally supports
> storing entries which consist of multiple tokens.
>
> Do we have multi token abbreviations? If yes, we should use Dictionary.
> Otherwise we could still use it, then the tokenizer could have a small util
> method to turn a Dictionary into a Set<String>.
>
> Reusing the Dictionary makes a few things easier because we do not have
> to duplicate them.
>
> We can also change the DefaultSDContextGenerator, if that is more
> convenient.
>


OK, I'll revert my changes and do it using Dictionary.
I never had to do it in SVN, can you help me point me how to do it? I
searched the web how to do it but I'm not feeling confident. I'm using
Eclipse.
I need to revert the changes related to the issues 225 and 234.

Thanks
William

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message