opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Hernandez <nicolas.hernan...@gmail.com>
Subject Re: Evaluator and CrossValidator throw a java.lang.NullPointerException
Date Thu, 06 Oct 2011 12:46:58 GMT
On Thu, Oct 6, 2011 at 2:34 PM, Jörn Kottmann <kottmann@gmail.com> wrote:
> Looks like the Cross Validator is failing because you do
> not have enough data? On how many sample sentences do you
> run it?
I tested with 1 000 and 1 000 000... same results except I had to
extend the java heap size for one of them before getting the error...



Just to let you know for, below you will find what I got for the
Tokenizer (here with a 1000 sentences train corpus)

$ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
data/model/fr-token.bin -data data/test/fr-token.test
Loading Tokenizer model ... done (0,428s)
Evaluating ... Exception in thread "main" java.lang.NullPointerException
	at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
	at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
	at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
	at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
data/train/fr-token.train
Indexing events using cutoff of 5
	Computing event counts...  done. 100333 events
	Indexing...  done.
Sorting and merging events... done. Reduced 100333 events to 30168.
Done indexing.
Incorporating indexed data for training...
done.
	Number of Event Tokens: 30168
	    Number of Outcomes: 2
	  Number of Predicates: 8287
...done.
Computing model parameters ...
Performing 100 iterations.
  1:  ... loglikelihood=-69545.53606709359	0.9337805108987073
  2:  ... loglikelihood=-18987.123809719425	0.9497872085953774
...
 98:  ... loglikelihood=-607.4216932752298	0.9989534848952987
 99:  ... loglikelihood=-603.2346954947699	0.9989734185163406
100:  ... loglikelihood=-599.1235213848983	0.9989833853268616
Exception in thread "main" java.lang.NullPointerException
	at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
	at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
	at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
	at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)
	at opennlp.tools.cmdline.CLI.main(CLI.java:191)


>
> We will investigate this further.
>
> Jörn
>
> On 10/6/11 2:26 PM, Nicolas Hernandez wrote:
>>
>> Please find below the output of two runs which lead to an error:
>> SentenceDetectorEvaluator without "-misclassified true" parameter and
>> SentenceDetectorCrossValidator (which gives the same error with or
>> without "-misclassified true").
>>
>> I tested on the examples from the documentation and also with my data.
>> Tell if you want more details or anything
>>
>> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
>> data/model/fr-sent.bin -data data/test/fr-sent.test
>> Loading Sentence Detector model ... done (0,013s)
>> Evaluating ...  in thread "main" java.lang.NullPointerException
>>        at
>> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>>        at
>> opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>>
>> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
>> data/train/fr-sent.train -misclassified true
>> Indexing events using cutoff of 5
>>
>>        Computing event counts...  done. 0 events
>>        Indexing...  done.
>> Sorting and merging events... Done indexing.
>> Incorporating indexed data for training...
>> Exception in thread "main" java.lang.NullPointerException
>>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>>        at
>> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>>        at
>> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>>        at
>> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>>
>>
>>
>> On Thu, Oct 6, 2011 at 1:02 PM, Jörn Kottmann<kottmann@gmail.com>  wrote:
>>>
>>> On 10/6/11 12:42 PM, Nicolas Hernandez wrote:
>>>>
>>>> I try to run the Evaluator and CrossValidator programs of the 1.5.3 in
>>>> command line ?
>>>>
>>>> It seems that the SentenceDetector, Tokenizer, PosTagger and the
>>>> chunker (at least) throw a java.lang.NullPointerException if the
>>>> misclassified parameter is set to false or not present for the
>>>> Evaluator programs. The CrossValidator programs do not work at all.
>>>>
>>>> Before looking at it, is something (e.g. global refactoring) planed
>>>> about
>>>> it ?
>>>
>>> 1.5.3 is the mostly the same version as the 1.5.2 RC 2.
>>>
>>> The bugs you describe here should of course not be present, and must be
>>> fixed for the 1.5.2 release. We just did a major refactoring of a lot of
>>> cmd
>>> line
>>> code. Looks like a regression.
>>>
>>> Can you please give us more details? The stack trace would be helpful and
>>> the
>>> command line arguments you passed in. To find a bug I believe it should
>>> be
>>> enough
>>> to get this for one of the mentioned evaluators.
>>>
>>> Jörn
>>>
>
>

Mime
View raw message