opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Hernandez <nicolas.hernan...@gmail.com>
Subject Re: Evaluator and CrossValidator throw a java.lang.NullPointerException
Date Thu, 06 Oct 2011 15:47:23 GMT
done here

https://issues.apache.org/jira/browse/OPENNLP-316

On Thu, Oct 6, 2011 at 5:32 PM, william.colen@gmail.com
<william.colen@gmail.com> wrote:
> Hi Nicolas,
>
> Can you please open a Jira?
> I will investigate the issue.
>
> Thanks,
> William
>
>
> On Thu, Oct 6, 2011 at 9:46 AM, Nicolas Hernandez <
> nicolas.hernandez@gmail.com> wrote:
>
>> On Thu, Oct 6, 2011 at 2:34 PM, Jörn Kottmann <kottmann@gmail.com> wrote:
>> > Looks like the Cross Validator is failing because you do
>> > not have enough data? On how many sample sentences do you
>> > run it?
>> I tested with 1 000 and 1 000 000... same results except I had to
>> extend the java heap size for one of them before getting the error...
>>
>>
>>
>> Just to let you know for, below you will find what I got for the
>> Tokenizer (here with a 1000 sentences train corpus)
>>
>> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
>> data/model/fr-token.bin -data data/test/fr-token.test
>> Loading Tokenizer model ... done (0,428s)
>> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>>        at
>> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>>        at
>> opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>>
>> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
>> data/train/fr-token.train
>> Indexing events using cutoff of 5
>>         Computing event counts...  done. 100333 events
>>        Indexing...  done.
>> Sorting and merging events... done. Reduced 100333 events to 30168.
>> Done indexing.
>> Incorporating indexed data for training...
>> done.
>>        Number of Event Tokens: 30168
>>            Number of Outcomes: 2
>>          Number of Predicates: 8287
>> ...done.
>> Computing model parameters ...
>> Performing 100 iterations.
>>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
>> ...
>>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
>> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
>> Exception in thread "main" java.lang.NullPointerException
>>         at
>> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>>        at
>> opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>>        at
>> opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)
>>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>>
>>
>> >
>> > We will investigate this further.
>> >
>> > Jörn
>> >
>> > On 10/6/11 2:26 PM, Nicolas Hernandez wrote:
>> >>
>> >> Please find below the output of two runs which lead to an error:
>> >> SentenceDetectorEvaluator without "-misclassified true" parameter and
>> >> SentenceDetectorCrossValidator (which gives the same error with or
>> >> without "-misclassified true").
>> >>
>> >> I tested on the examples from the documentation and also with my data.
>> >> Tell if you want more details or anything
>> >>
>> >> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
>> >> data/model/fr-sent.bin -data data/test/fr-sent.test
>> >> Loading Sentence Detector model ... done (0,013s)
>> >> Evaluating ...  in thread "main" java.lang.NullPointerException
>> >>        at
>> >> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>> >>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>> >>        at
>> >>
>> opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>> >>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>> >>
>> >> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
>> >> data/train/fr-sent.train -misclassified true
>> >> Indexing events using cutoff of 5
>> >>
>> >>        Computing event counts...  done. 0 events
>> >>        Indexing...  done.
>> >> Sorting and merging events... Done indexing.
>> >> Incorporating indexed data for training...
>> >> Exception in thread "main" java.lang.NullPointerException
>> >>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>> >>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>> >>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>> >>        at
>> >>
>> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>> >>        at
>> >>
>> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>> >>        at
>> >>
>> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>> >>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>> >>
>> >>
>> >>
>> >> On Thu, Oct 6, 2011 at 1:02 PM, Jörn Kottmann<kottmann@gmail.com>
>>  wrote:
>> >>>
>> >>> On 10/6/11 12:42 PM, Nicolas Hernandez wrote:
>> >>>>
>> >>>> I try to run the Evaluator and CrossValidator programs of the 1.5.3
in
>> >>>> command line ?
>> >>>>
>> >>>> It seems that the SentenceDetector, Tokenizer, PosTagger and the
>> >>>> chunker (at least) throw a java.lang.NullPointerException if the
>> >>>> misclassified parameter is set to false or not present for the
>> >>>> Evaluator programs. The CrossValidator programs do not work at all.
>> >>>>
>> >>>> Before looking at it, is something (e.g. global refactoring) planed
>> >>>> about
>> >>>> it ?
>> >>>
>> >>> 1.5.3 is the mostly the same version as the 1.5.2 RC 2.
>> >>>
>> >>> The bugs you describe here should of course not be present, and must
be
>> >>> fixed for the 1.5.2 release. We just did a major refactoring of a lot
>> of
>> >>> cmd
>> >>> line
>> >>> code. Looks like a regression.
>> >>>
>> >>> Can you please give us more details? The stack trace would be helpful
>> and
>> >>> the
>> >>> command line arguments you passed in. To find a bug I believe it should
>> >>> be
>> >>> enough
>> >>> to get this for one of the mentioned evaluators.
>> >>>
>> >>> Jörn
>> >>>
>> >
>> >
>>
>



-- 
nicolas.hernandez@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire Informatique de Nantes Atlantique CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

Mime
View raw message