opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: Missing serializer for postagger.bin
Date Wed, 14 Jun 2017 14:31:02 GMT
We have to fix this, William wrote a unit test to reproduce it.

Jörn

On Fri, Jun 9, 2017 at 4:31 PM, Damiano Porta <damianoporta@gmail.com>
wrote:

> Jorn,
> the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries
> (PR #220) but the problem with the postagger serialization still here. i
> can confirm that the last snapshot cannot serialize the postagger using the
> cmd tool,
>
> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
> -model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen
> /home/damiano/test.xml -sequenceCodec BIO -resources
> /home/damiano/lavoro/java/Parser/src/main/resources/*
>
>
> *Writing name finder model ... Compressed 885605 parameters to 94030*
> *3451 outcome patterns*
> *Exception in thread "main" java.lang.IllegalStateException: Missing
> serializer for it-pos-maxent.bin*
> * at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)*
> * at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)*
> * at
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(
> TokenNameFinderTrainerTool.java:188)*
> * at opennlp.tools.cmdline.CLI.main(CLI.java:244)*
>
> I have used this generators.xml file:
>
> *<?xml version="1.0" encoding="UTF-8"?>*
> *<generators>*
> *    <cache>*
> *        <generators>*
> *            <window prevLength="4" nextLength="2">*
> *                <tokenclass />*
> *            </window>*
> *            <window prevLength="4" nextLength="2">*
> *                <token />*
> *            </window> *
> *            <!-- Pos Tagger -->                *
> *            <window prevLength="4" nextLength="2">*
> *                <tokenpos model="it-pos-maxent.bin" />*
> *            </window>       *
> *            <definition />*
> *            <prevmap />*
> *            <bigram />*
> *            <sentence begin="true" end="false" />          *
> *        </generators>*
> *    </cache>*
> *</generators>*
>
>
>
>
> 2017-06-09 15:17 GMT+02:00 Damiano Porta <damianoporta@gmail.com>:
>
> > Jorn,
> > At the moment i am using the command tool to train my ner model, but i am
> > getting this error:
> >
> > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
> > -model /home/damiano/it-person-perceptron.bin -featuregen
> > /home/damiano/test.xml -sequenceCodec BIO -resources
> > /home/damiano/lavoro/java/Parser/src/main/resources/*
> >
> > *Exception in thread "main"
> > opennlp.tools.namefind.TokenNameFinderModel$
> FeatureGeneratorCreationError:
> > opennlp.tools.util.InvalidFormatException: No dictionary resource for
> key:
> > nations.dictionary*
> > at opennlp.tools.namefind.TokenNameFinderFactory.
> createFeatureGenerators(
> > TokenNameFinderFactory.java:209)
> > at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator(
> > TokenNameFinderFactory.java:150)
> > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241)
> > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(
> > TokenNameFinderTrainerTool.java:169)
> > at opennlp.tools.cmdline.CLI.main(CLI.java:244)
> > Caused by: opennlp.tools.util.InvalidFormatException: No dictionary
> > resource for key: nations.dict
> > at opennlp.tools.util.featuregen.GeneratorFactory$
> > DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251)
> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> > GeneratorFactory.java:732)
> > at opennlp.tools.util.featuregen.GeneratorFactory$
> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> > GeneratorFactory.java:732)
> > at opennlp.tools.util.featuregen.GeneratorFactory$
> > CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172)
> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> > GeneratorFactory.java:732)
> > at opennlp.tools.util.featuregen.GeneratorFactory$
> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
> > GeneratorFactory.java:732)
> > at opennlp.tools.util.featuregen.GeneratorFactory.create(
> > GeneratorFactory.java:782)
> > at opennlp.tools.namefind.TokenNameFinderFactory.
> createFeatureGenerators(
> > TokenNameFinderFactory.java:189)
> > ... 4 more
> >
> > As you can see the problem is "
> > No dictionary resource for key: nations.dictionary" because i also need
> to
> > add a dictionary inside my model.
> >
> > I did these test:
> >
> > *1. used the name nations.dictionary as resource name in my
> generators.xml
> > and <dictionary dict="nations.dictionary" prefix="nation" />*
> >
> > *2.used the name nations.xml as resource name in my generators.xml and
> > <dictionary dict="nations.xml" prefix="nation" />*
> >
> > *3.used the name nations.dict as resource name in my generators.xml and
> > <dictionary dict="nations.dict" prefix="nation" />*
> >
> > for each test i also have renamed the dictionary fiile name inside my
> > -resource directory.
> >
> > I had no luck.
> >
> > How should i call a dictionary resource?
> >
> > Thanks.
> >
> >
> >
> > 2017-06-07 16:20 GMT+02:00 Damiano Porta <damianoporta@gmail.com>:
> >
> >> Hello Jorn,
> >> i confirm the error. Please take a look at the code below. It is a
> >> working example, you only need to edit the constants GENERATORS,
> POSTAGGER
> >> and SERIALIZED.
> >>
> >>
> >> *TEST FILE:*
> >>
> >> package com.damiano.trainer;
> >>
> >> import java.io.BufferedOutputStream;
> >> import java.io.FileInputStream;
> >> import java.io.FileOutputStream;
> >> import java.io.IOException;
> >> import java.io.InputStream;
> >> import java.util.ArrayList;
> >> import java.util.HashMap;
> >> import java.util.List;
> >> import java.util.Map;
> >> import opennlp.tools.ml.perceptron.PerceptronTrainer;
> >> import opennlp.tools.namefind.BioCodec;
> >> import opennlp.tools.namefind.NameFinderME;
> >> import opennlp.tools.namefind.NameSample;
> >> import opennlp.tools.namefind.TokenNameFinderFactory;
> >> import opennlp.tools.namefind.TokenNameFinderModel;
> >> import opennlp.tools.postag.POSModel;
> >> import opennlp.tools.util.ObjectStream;
> >> import opennlp.tools.util.ObjectStreamUtils;
> >> import opennlp.tools.util.TrainingParameters;
> >> import org.apache.commons.io.IOUtils;
> >>
> >> public class Test {
> >>
> >>     private final String GENERATORS = "/home/damiano/test.xml";
> >>     private final String POSTAGGER = "/home/damiano/postagger.bin";
> >>     private final String SERIALIZED = "/home/damiano/serialized.bin";
> >>
> >>     public static void main(String[] args) throws IOException {
> >>         Test test = new Test();
> >>     }
> >>
> >>     public Test() throws IOException {
> >>
> >>         List<NameSample> labelled = new ArrayList<>();
> >>
> >>         labelled.add(NameSample.parse("This is a sentence
> <START:person>
> >> JACOB <END>", false));
> >>         labelled.add(NameSample.parse("This is a sentence
> <START:person>
> >> JACK <END>", false));
> >>         labelled.add(NameSample.parse("This is a sentence
> <START:person>
> >> THOMAS <END>", false));
> >>         labelled.add(NameSample.parse("This is a sentence
> <START:person>
> >> GEORGE <END>", false));
> >>         labelled.add(NameSample.parse("This is a sentence
> <START:person>
> >> WILLIAM <END>", false));
> >>         labelled.add(NameSample.parse("This is a sentence
> <START:person>
> >> JAMES <END>", false));
> >>
> >>         TokenNameFinderFactory factory;
> >>
> >>         try (ObjectStream<NameSample> samples =
> >> ObjectStreamUtils.createObjectStream(labelled)) {
> >>             //HashMap<String, Object> map = new HashMap<>();
> >>
> >>             try (InputStream in = new FileInputStream(GENERATORS)) {
> >>
> >>                 // Resources
> >>                 Map<String, Object> map = new HashMap<>();
> >>
> >>                 // Pos Tagger
> >>                 map.put("postagger.bin", Test.loadPosTagger(POSTAGGER))
> ;
> >>
> >>
> >>                 // Factory
> >>                 factory = new TokenNameFinderFactory(
> >>                     IOUtils.toByteArray(in),
> >>                     map,
> >>                     new BioCodec()
> >>                 );
> >>
> >>                 try {
> >>
> >>                     TrainingParameters mlParams = new
> >> TrainingParameters();
> >>                     mlParams.put(TrainingParameters.ALGORITHM_PARAM,
> >> PerceptronTrainer.PERCEPTRON_VALUE);
> >>                     mlParams.put(TrainingParameters.ITERATIONS_PARAM,
> >> Integer.toString(300));
> >>                     mlParams.put(TrainingParameters.CUTOFF_PARAM,
> >> Integer.toString(0));
> >>
> >>                     TokenNameFinderModel model =
> NameFinderME.train("it",
> >> "person", samples, mlParams, factory);
> >>
> >>                     try (BufferedOutputStream modelOut = new
> >> BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
> >>                         model.serialize(modelOut);
> >>                     }
> >>
> >>                 } catch (Exception ex) {
> >>                     ex.printStackTrace();
> >>                 }
> >>
> >>             }
> >>         }
> >>     }
> >>
> >>     public static POSModel loadPosTagger (String modelName) {
> >>
> >>         try (InputStream modelIn = new FileInputStream(modelName)) {
> >>             POSModel model = new POSModel(modelIn);
> >>             return model;
> >>         }
> >>         catch (Exception ex) { ex.printStackTrace();  }
> >>
> >>         return null;
> >>     }
> >> }
> >>
> >> *GENERATORS:*
> >>
> >> <?xml version="1.0" encoding="UTF-8"?>
> >> <generators>
> >>     <cache>
> >>         <generators>
> >>             <window prevLength="4" nextLength="2">
> >>                 <tokenclass />
> >>             </window>
> >>             <window prevLength="4" nextLength="2">
> >>                 <token />
> >>             </window>
> >>             <!-- Pos Tagger -->
> >>             <window prevLength="4" nextLength="2">
> >>                 <tokenpos model="postagger.bin" />
> >>             </window>
> >>             <definition />
> >>             <prevmap />
> >>             <bigram />
> >>             <sentence begin="true" end="false" />
> >>         </generators>
> >>     </cache>
> >> </generators>
> >>
> >>
> >> *OUTPUT (with error):*
> >>
> >>
> >> *Indexing events using cutoff of 0 Computing event counts...  done. 30
> >> events Indexing...  done.Collecting events... Done
> indexing.Incorporating
> >> indexed data for training...  done. Number of Event Tokens: 30
> Number of
> >> Outcomes: 2  Number of Predicates: 144Computing model
> >> parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  .
> (30/30)
> >> 1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
> >> change in training set accuracy less than 1.0E-5Stats: (30/30)
> >> 1.0...done.Compressed 144 parameters to 621 outcome
> >> patternsjava.lang.IllegalStateException: Missing serializer for
> >> postagger.bin at
> >> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
> >> com.damiano.trainer.Test.<init>(Test.java:75) at
> >> com.damiano.trainer.Test.main(Test.java:31)*
> >>
> >> 2017-06-07 15:48 GMT+02:00 Damiano Porta <damianoporta@gmail.com>:
> >>
> >>> Hmm let me try again, yes i copied it badly, i think the names are
> >>> correct, i will give you a working example.
> >>>
> >>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <kottmann@gmail.com>:
> >>>
> >>>> Ok, but are you sure you used matching names? The exception states
> >>>> it-pos-maxent.bin,
> >>>> which object did you map to it?
> >>>>
> >>>> Jörn
> >>>>
> >>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <damianoporta@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>> > Hi Jorn! Yes
> >>>> >
> >>>> >         <dependency>
> >>>> >             <groupId>org.apache.opennlp</groupId>
> >>>> >             <artifactId>opennlp-tools</artifactId>
> >>>> >             <version>1.8.0</version>
> >>>> >         </dependency>
> >>>> >
> >>>> > Do i need others dependencies too?
> >>>> >
> >>>> >
> >>>> >
> >>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottmann@gmail.com>:
> >>>> >
> >>>> > > This should be working. Did you test with 1.8.0?
> >>>> > >
> >>>> > > Jörn
> >>>> > >
> >>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <
> >>>> damianoporta@gmail.com>
> >>>> > > wrote:
> >>>> > >
> >>>> > > > Hello,
> >>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
> >>>> > > >
> >>>> > > > <tokenpos model="postagger.bin" />
> >>>> > > >
> >>>> > > > during the training i add this model in the resources
doing:
> >>>> > > >
> >>>> > > >         HashMap<String, Object> map = new HashMap<>();
> >>>> > > >         map.put("postagger.bin", myPostaggerModel);
> >>>> > > >
> >>>> > > >
> >>>> > > >          factory = new TokenNameFinderFactory(
> >>>> > > >                IOUtils.toByteArray(in),
> >>>> > > >                map,
> >>>> > > >                new BioCodec()
> >>>> > > >          );
> >>>> > > >
> >>>> > > > I get this error:
> >>>> > > >
> >>>> > > > java.lang.IllegalStateException: Missing serializer for
> >>>> > > it-pos-maxent.bin
> >>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:
> >>>> 589)
> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187)
> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
> >>>> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
> >>>> java.lang.IllegalStateExceptio
> >>>> > n:
> >>>> > > > Missing serializer for postagger.bin
> >>>> > > >
> >>>> > > > Do i have to change the extension of the file?
> >>>> > > >
> >>>> > > > Thanks
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message