opennlp-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Hernandez <>
Subject UIMA TokenizerTrainer component : the model file is not created
Date Wed, 15 Jun 2011 14:46:51 GMT

Does someone have already used the UIMA TokenizerTrainer component ? I
am a bit confused since it does not create any model file.

In my stdout I got this :
Indexing events using cutoff of 5
	Computing event counts...

done. 69669 events
	Indexing...  done.
Sorting and merging events... done. Reduced 69669 events to 16467.
Done indexing.
Incorporating indexed data for training...
	Number of Event Tokens: 16467
	    Number of Outcomes: 1
	  Number of Predicates: 5624
Computing model parameters...
Performing 100 iterations.
  1:  .. loglikelihood=0.0	1.0
  2:  .. loglikelihood=0.0	1.0

This look like a problem I got when I trained the model in command
line without using the '<SPLIT>' tag. In command line, It differs
since in command line I also got the following exception
Exception in thread "main" java.lang.IllegalArgumentException: The
maxent model is not compatible!

I solved this problem by adding the tag as it is mentioned in the post
of maxent model is not compatible with Tokenizer training	Fri, 13 May,

Does anyone know if it is the same problem ? In that case, how to
specify the '<SPLIT>' tag in the UIMA version? As much as I understand
its role, it is important to let the user the possibility of setting

More globaly I am interested by any return on experience of people who
successfully managed to build models with the UIMA OpenNLP * Trainer
components. For now, I also got some trouble with the SentenceTrainer
and I do not have test the others.


Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

View raw message