opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cohan Sujay Carlos <co...@aiaioo.com>
Subject Re: Are there plans to offer a Naive Bayesian Classifier in OpenNLP ?
Date Mon, 20 Jul 2015 08:24:32 GMT
Thanks Tommaso.

The latest patch is attached to the Jira issue as well (and I see I needn't
have attached it in these email messages).

(I have also emailed a copy of the completed ICLA to Joern).

Cohan

On Mon, Jul 20, 2015 at 1:44 PM, Tommaso Teofili <tommaso.teofili@gmail.com>
wrote:

> Hi Cohan,
>
> for future reference it'd be better if you could attach the patch to the
> Jira issue, I can have a look at your patch and comment later this week.
> Usually we mark an issue as fixed as soon as the relevant code is committed.
>
> Regards,
> Tommaso
>
>
> 2015-07-18 15:36 GMT+02:00 Cohan Sujay Carlos <cohan@aiaioo.com>:
>
>> Yet another patch (ouch, yeah, like they say, "in product development,
>> you're never done, till you're done!").
>>
>> Reason: Error in one of the test-cases.  The test-case
>> DocumentCategorizerNBTest is now correct and runs fine.
>>
>> Cohan Sujay Carlos
>> CEO, Aiaioo Labs
>> +91-77605-80015
>>
>> On Sat, Jul 18, 2015 at 6:51 PM, Cohan Sujay Carlos <cohan@aiaioo.com>
>> wrote:
>>
>>> A small update to the patch (I removed a superfluous piece of code).
>>>
>>> In the earlier path, I had used a subclass of
>>> opennlp.tools.doccat.DoccatModel called opennlp.tools.doccat.DoccatModelNB
>>> that was functionally identical.  I removed that subclass since it
>>> wasn't essential (DoccatModel does the trick just fine).
>>>
>>> Is there anything else I need to do?
>>>
>>> Is someone on the dev team going to be responsible for incorporating the
>>> patch into the codebase?
>>>
>>> Can I mark this Jira issue fixed (for target 1.6.1?).
>>>
>>> Cohan Sujay Carlos
>>> CEO, Aiaioo Labs
>>> +91-77605-80015
>>>
>>>
>>> On Sat, Jul 18, 2015 at 6:02 PM, Cohan Sujay Carlos <cohan@aiaioo.com>
>>> wrote:
>>>
>>>> I have gone ahead and written the test-cases and verified that the
>>>> Naive Bayes Classifier works correctly.
>>>>
>>>> Here is the latest patch (attached) with the test-cases and everything.
>>>>
>>>> In implementing the Naive Bayes classifier, we tried to *ensure
>>>> minimal disruption* to existing code.
>>>>
>>>> The *only* changes to existing code are as follows:
>>>>
>>>> 1. The opennlp.tools.ml.model.AbstractModel class has been changed to
>>>> include a new model type:
>>>>
>>>> line 35: *public enum ModelType *
>>>> *{Maxent,Perceptron,MaxentQn,NaiveBayes};*
>>>>
>>>> 2. The opennlp.tools.ml.model.GenericModelReader class has been
>>>> changed in one place:
>>>>
>>>> line 53:
>>>> *else if (modelType.equals("NaiveBayes")) **{ delegateModelReader =
>>>> new NaiveBayesModelReader(this.dataReader); }*
>>>>
>>>> 3. The opennlp.tools.ml.model.GenericModelWriter class has been
>>>> changed in two places:
>>>>
>>>> line 79:
>>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter
>>>> = new BinaryNaiveBayesModelWriter(model,dos); }*
>>>>
>>>> line 91:
>>>> *if (model.getModelType() == ModelType.NaiveBayes) **{ delegateWriter
>>>> = new PlainTextNaiveBayesModelWriter(model,bw); }*
>>>>
>>>> 4. The initializer of the opennlp.tools.ml.TrainerFactory class has
>>>> been changed in one place to add the Naive Bayes trainer:
>>>>
>>>> line 51:
>>>> *_trainers.put(NaiveBayesTrainer.NAIVE_BAYES_VALUE,
>>>> NaiveBayesTrainer.class);*
>>>>
>>>> That was it!
>>>>
>>>> We didn't change anything else in the existing OpenNLP code.
>>>>
>>>> All the new code for the Naive Bayesian classifier sits in the package
>>>> opennlp.tools.ml.naivebayes - just above the perceptron
>>>>
>>>> The code for the document categorizer using the Naive Bayesian
>>>> classifier can be found in opennlp.tools.doccat (we didn't have to
>>>> change any existing code). The new doccat is called
>>>> opennlp.tools.doccat.DocumentCategorizerNB (reflecting the name of the
>>>> maxent document categorizer, which is DocumentCategorizerME).
>>>>
>>>> Proof of correctness!
>>>>
>>>> I have included two testcases:
>>>>
>>>> 1. A test to validate the document categorizer - under the tests
>>>> folder, you will find opennlp.tools.doccat.DocumentCategorizerNBTest -
>>>> which runs the same tests that were run on the ME document categorizer, but
>>>> on the Naive Bayes categorizer instead (all tests passed).
>>>>
>>>> 2. A test to check the mathematical correctness of the Naive Bayes
>>>> implementation can be found in
>>>> opennlp.tools.ml.naivebayes.NaiveBayesCorrectnessTest.
>>>>
>>>> So, the inclusion of this code will minimally impact any existing code.
>>>>
>>>> And the code in this patch contains a multinomial Naive Bayesian
>>>> classifier that is verifiably correct.
>>>>
>>>> Is there anything else I have to do to have this patch pulled into the
>>>> OpenNLP code base (for say 1.7.0)?
>>>>
>>>> Cohan Sujay Carlos
>>>> CEO, Aiaioo Labs
>>>> +91-77605-80015
>>>>
>>>> On Tue, May 19, 2015 at 7:21 PM, Cohan Sujay Carlos <cohan@aiaioo.com>
>>>>> wrote:
>>>>>
>>>>>> Tommaso,
>>>>>>
>>>>>> I have created the Jira issue:
>>>>>> https://issues.apache.org/jira/browse/OPENNLP-777
>>>>>>
>>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message