opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Samik Raychaudhuri <sam...@gmail.com>
Subject Re: Pluggable Machine Learning support
Date Fri, 31 May 2013 21:38:32 GMT
Yep, supporting the move to a new package/namespace.

On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
> big +1!
>
> Tommaso
>
>
> 2013/5/31 William Colen <william.colen@gmail.com>
>
>> I don't see any issue. People that uses Maxent directly would need to
>> change how they use it, but that is OK for a major release.
>>
>>
>>
>>
>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <kottmann@gmail.com> wrote:
>>
>>> Are there any objections to move the maxent/perceptron classes to an
>>> opennlp.tools.ml
>>> package as part of this issue? Moving the things would avoid a second
>>> interface layer and
>>> probably make using OpenNLP Tools a bit easier, because then we are down
>>> to a single jar.
>>>
>>> Jörn
>>>
>>>
>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>
>>>> +1 to add pluggable machine learning algorithms
>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>
>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>
>>>>
>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <kottmann@gmail.com>
>>>> wrote:
>>>>
>>>>   Hi all,
>>>>> we spoke about it here and there already, to ensure that OpenNLP can
>> stay
>>>>> competitive with other NLP libraries I am proposing to make the machine
>>>>> learning pluggable.
>>>>>
>>>>> The extensions should not make it harder to use OpenNLP, if a user
>> loads
>>>>> a
>>>>> model OpenNLP should be capable of setting up everything by itself
>>>>> without
>>>>> forcing the user to write custom integration code based on the ml
>>>>> implementation.
>>>>> We solved this problem already with the extension mechanism, we build
>> to
>>>>> support the customization of our components, I suggest that we reuse
>> this
>>>>> extension mechanism to load a ml implementation. To use a custom ml
>>>>> implementation the user has to specify the class name of the factory
in
>>>>> the
>>>>> Algorithm field of the params file. The params file is available during
>>>>> training and tagging time.
>>>>>
>>>>> Most components in the tools package use the maxent library to do
>>>>> classification. The Java interfaces for this are currently located in
>> the
>>>>> maxent package, to be able to swap the implementation the interfaces
>>>>> should
>>>>> be defined inside the tools package. To make things easier I propose
to
>>>>> move the maxent and perceptron implemention as well.
>>>>>
>>>>> Through the code base we use the AbstractModel, thats a bit unlucky
>>>>> because the only reason for this is the lack of model serialization
>>>>> support
>>>>> in the MaxentModel interface, a serialization method should be added
to
>>>>> it,
>>>>> and maybe renamed to ClassificationModel. This will
>>>>> break backward compatibility in non-standard use cases.
>>>>>
>>>>> To be able to test the extension mechanism I suggest that we implement
>> an
>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>
>>>>> There are still a few deprecated 1.4 constructors and methods in
>> OpenNLP
>>>>> which directly reference interfaces and classes in the maxent library,
>>>>> these need to be removed, to be able to move the interfaces to the
>> tools
>>>>> package.
>>>>>
>>>>> Any opinions?
>>>>>
>>>>> Jörn
>>>>>
>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message