uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: missing Ruta annotations from uimaFit
Date Thu, 07 Jul 2016 16:58:17 GMT
Hi,

I fixed the problem by taking the same code and putting it in a class that
extends CasConsumer_ImplBase.  This method is called from the class's
process method

 private void processFeatureStructures(CAS aCAS) {

   Iterator<AnnotationFS> annotationIterator =
aCAS.getAnnotationIndex().iterator();



   while (annotationIterator.hasNext()) {
     AnnotationFS annotation = annotationIterator.next();

     try {
       out.println("[" + annotation.getCoveredText() + "]");
       Type type = annotation.getType();
       String typeName = type.getName();
       out.println("found annotation type " + typeName );

       Integer count = typeCounts.get(typeName);
       out.println("current count is " + count);

etc.

I then use this code

AnalysisEngineDescription myWriterDesc =
AnalysisEngineFactory.createEngineDescription(CountWriter.class,
CountWriter.PARAM_OUTPUT_FILE, "myOutput.txt");
SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, myWriterDesc);

to run it.  And it works perfectly.  All my annotations show up.   Which is
great, because I now have my totals that I needed. But it isn't great,
because I cannot explain this. That worries me because it is hard to
develop complex systems when you don't understand the underlying model.
Also, this method does not work either, and I believe that it should

AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
JCas jCas = rae.newJCas();
jCas.setDocumentText(someText);
rae.process(jCas);
 displayRutaResults(jCas);

In this case, the jCas contents were incorrect in the same way that they
were when I used
for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
rutaEngineDesc)) {
displayRutaResults(jcas);

Only if I put the code in a class that is used in runPipeline do I get the
correct results.  If you have any ideas about this, I would love to know.

In any case, I almost have all the results I need. I have one more task - I
need to figure out how to obtain text that was annotated with one
particular annotation and NO OTHER. I am thinking that selectCovered might
do what I need, but I can't find many examples of how to use it.

thanks for your help,

Bonnie MacKellar

On Thu, Jul 7, 2016 at 4:48 AM, Peter Klügl <peter.kluegl@averbis.com>
wrote:

> Nope, that should not be a problem since the types are initialized in
> process()
>
>
> Am 07.07.2016 um 10:26 schrieb Richard Eckart de Castilho:
> > Hi,
> >
> > iteratePipeline and runPipeline should be mostly equivalent.
> > A difference occurs if you e.g. have a CAS multiplier within
> > an aggregate engine.
> >
> > runPipeline delegates the execution to the UIMA core and is able
> > to handle CAS multipliers.
> >
> > iteratePipeline (re)uses a single CAS instance which is passed
> > to the reader and all analysis engines in turn. It does not
> > support CAS multipliers.
> >
> > A user recently pointed out that uimaFIT 2.2.0 reintroduces a bug
> > in iteratePipeline - typeSystemInit() is not called [1].
> >
> > @Peter: could the missing call to typeSystemInit() be a problem for Ruta?
> >
> > Cheers,
> >
> > -- Richard
> >
> > [1] https://issues.apache.org/jira/browse/UIMA-4998
> >
> >> On 07.07.2016, at 09:17, Peter Klügl <peter.kluegl@averbis.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >> I have no idea yet why the code with iteratePipeline does not work.
> >>
> >>
> >> Richard, do you have an idea?
> >>
> >>
> >> Are there any exceptions? Do you use the rae objects somewhere? Is your
> >> code hosted somewhere, e.g., on github? What do you mean by your own
> >> annotations? Annotations of an external type system or annotations added
> >> by another engine or reader?
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >>
> >> Am 06.07.2016 um 02:41 schrieb Bonnie MacKellar:
> >>> I have a very lengthy Ruta script which annotates my files
> successfully. I
> >>> can see all the annotations in AnnotationBrowser and they are correct.
> >>> I want to get all the annotations in a Java program, so I can count
> >>> occurrences.  I am using uimaFit. I am getting very odd results.
> >>>
> >>> When I use CasDumpWriter, I see all my annotations, correctly written
> to
> >>> the dump file. Here is the code that does this
> >>>
> -------------------------------------------------------------------------------------------------------
> >>> AnalysisEngineDescription rutaEngineDesc =
> >>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> >>> RutaEngine.PARAM_MAIN_SCRIPT,
> >>>           "ecClassifier",
> >>>           RutaEngine.PARAM_SCRIPT_PATHS, new String[]
> >>>
> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"},
> >>>           RutaEngine.PARAM_DESCRIPTOR_PATHS,  new String[]
> >>>
> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated-sources/ruta/descriptor"},
> >>>           RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES,
> >>> "org.apache.uima.ruta.engine.PlainTextAnnotator");
> >>> AnalysisEngineDescription writerDesc =
> >>> AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
> >>> CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
> >>> AnalysisEngine rae =
> AnalysisEngineFactory.createEngine(rutaEngineDesc);
> >>> SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);
> >>>
> -----------------------------------------------------------------------------------------------------
> >>>
> >>> However, when I try to do this myself, using iteratePipeline to iterate
> >>> through the JCas structures for each input file, many of the
> annotations
> >>> are missing. I have a suspicion that the missing annotations are ones
> that
> >>> annotate text for which there is another annotation.   For example,
> text
> >>> will be annotated with Line, and with my own annotation. My code to
> print
> >>> the annotations is based on the code in CasDumpWriter.
> >>>
> >>>
> -----------------------------------------------------------------------------------------------------
> >>>
> >>> for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
> >>> rutaEngineDesc)) {
> >>> displayRutaResults(jcas);
> >>>
> >>>
> >>> public void displayRutaResults(JCas jcas)
> >>> {
> >>> System.out.println("in display ruta results");
> >>>
> >>>     FSIterator<Annotation> annotationIter =
> >>> jcas.getAnnotationIndex().iterator();
> >>>     while (annotationIter.hasNext())
> >>>     {
> >>>     AnnotationFS annotation = annotationIter.next();
> >>>     System.out.println(annotation.getType().getName());
> >>>     System.out.println(annotation.getCoveredText());
> >>>
> >>>     System.out.println("------------------------------------------");
> >>>    //  System.out.println(annotation.toString());
> >>>     }
> >>> }
> >>>
> >>>
> ------------------------------------------------------------------------------------------------
> >>>
> >>> Why would this code produce different results than CasDumpWriter, which
> >>> uses almost exactly the same code?   Is it something to do with using
> >>> runPipeline vs iteratePipeline? Should I write my code so it can be
> placed
> >>> inside runPipeline?
> >>>
> >>> thanks so much!
> >>> Bonnie MacKellar
> >>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message