uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject missing Ruta annotations from uimaFit
Date Wed, 06 Jul 2016 00:41:14 GMT
I have a very lengthy Ruta script which annotates my files successfully. I
can see all the annotations in AnnotationBrowser and they are correct.
I want to get all the annotations in a Java program, so I can count
occurrences.  I am using uimaFit. I am getting very odd results.

When I use CasDumpWriter, I see all my annotations, correctly written to
the dump file. Here is the code that does this
-------------------------------------------------------------------------------------------------------
AnalysisEngineDescription rutaEngineDesc =
AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
RutaEngine.PARAM_MAIN_SCRIPT,
           "ecClassifier",
           RutaEngine.PARAM_SCRIPT_PATHS, new String[]
{"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"},
           RutaEngine.PARAM_DESCRIPTOR_PATHS,  new String[]
{"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated-sources/ruta/descriptor"},
           RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES,
"org.apache.uima.ruta.engine.PlainTextAnnotator");
AnalysisEngineDescription writerDesc =
AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);
-----------------------------------------------------------------------------------------------------

However, when I try to do this myself, using iteratePipeline to iterate
through the JCas structures for each input file, many of the annotations
are missing. I have a suspicion that the missing annotations are ones that
annotate text for which there is another annotation.   For example, text
will be annotated with Line, and with my own annotation. My code to print
the annotations is based on the code in CasDumpWriter.

-----------------------------------------------------------------------------------------------------

for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
rutaEngineDesc)) {
displayRutaResults(jcas);


public void displayRutaResults(JCas jcas)
{
System.out.println("in display ruta results");

     FSIterator<Annotation> annotationIter =
jcas.getAnnotationIndex().iterator();
     while (annotationIter.hasNext())
     {
     AnnotationFS annotation = annotationIter.next();
     System.out.println(annotation.getType().getName());
     System.out.println(annotation.getCoveredText());

     System.out.println("------------------------------------------");
    //  System.out.println(annotation.toString());
     }
}

------------------------------------------------------------------------------------------------

Why would this code produce different results than CasDumpWriter, which
uses almost exactly the same code?   Is it something to do with using
runPipeline vs iteratePipeline? Should I write my code so it can be placed
inside runPipeline?

thanks so much!
Bonnie MacKellar

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message