uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <peter.klu...@averbis.com>
Subject Re: missing Ruta annotations from uimaFit
Date Thu, 07 Jul 2016 07:17:44 GMT
Hi,


I have no idea yet why the code with iteratePipeline does not work.


Richard, do you have an idea?


Are there any exceptions? Do you use the rae objects somewhere? Is your
code hosted somewhere, e.g., on github? What do you mean by your own
annotations? Annotations of an external type system or annotations added
by another engine or reader?


Best,


Peter


Am 06.07.2016 um 02:41 schrieb Bonnie MacKellar:
> I have a very lengthy Ruta script which annotates my files successfully. I
> can see all the annotations in AnnotationBrowser and they are correct.
> I want to get all the annotations in a Java program, so I can count
> occurrences.  I am using uimaFit. I am getting very odd results.
>
> When I use CasDumpWriter, I see all my annotations, correctly written to
> the dump file. Here is the code that does this
> -------------------------------------------------------------------------------------------------------
> AnalysisEngineDescription rutaEngineDesc =
> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> RutaEngine.PARAM_MAIN_SCRIPT,
>            "ecClassifier",
>            RutaEngine.PARAM_SCRIPT_PATHS, new String[]
> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"},
>            RutaEngine.PARAM_DESCRIPTOR_PATHS,  new String[]
> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated-sources/ruta/descriptor"},
>            RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES,
> "org.apache.uima.ruta.engine.PlainTextAnnotator");
> AnalysisEngineDescription writerDesc =
> AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
> CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
> AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
> SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);
> -----------------------------------------------------------------------------------------------------
>
> However, when I try to do this myself, using iteratePipeline to iterate
> through the JCas structures for each input file, many of the annotations
> are missing. I have a suspicion that the missing annotations are ones that
> annotate text for which there is another annotation.   For example, text
> will be annotated with Line, and with my own annotation. My code to print
> the annotations is based on the code in CasDumpWriter.
>
> -----------------------------------------------------------------------------------------------------
>
> for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
> rutaEngineDesc)) {
> displayRutaResults(jcas);
>
>
> public void displayRutaResults(JCas jcas)
> {
> System.out.println("in display ruta results");
>
>      FSIterator<Annotation> annotationIter =
> jcas.getAnnotationIndex().iterator();
>      while (annotationIter.hasNext())
>      {
>      AnnotationFS annotation = annotationIter.next();
>      System.out.println(annotation.getType().getName());
>      System.out.println(annotation.getCoveredText());
>
>      System.out.println("------------------------------------------");
>     //  System.out.println(annotation.toString());
>      }
> }
>
> ------------------------------------------------------------------------------------------------
>
> Why would this code produce different results than CasDumpWriter, which
> uses almost exactly the same code?   Is it something to do with using
> runPipeline vs iteratePipeline? Should I write my code so it can be placed
> inside runPipeline?
>
> thanks so much!
> Bonnie MacKellar
>


Mime
View raw message