uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: missing Ruta annotations from uimaFit
Date Thu, 07 Jul 2016 08:26:11 GMT
Hi,

iteratePipeline and runPipeline should be mostly equivalent.
A difference occurs if you e.g. have a CAS multiplier within
an aggregate engine. 

runPipeline delegates the execution to the UIMA core and is able
to handle CAS multipliers. 

iteratePipeline (re)uses a single CAS instance which is passed
to the reader and all analysis engines in turn. It does not
support CAS multipliers.

A user recently pointed out that uimaFIT 2.2.0 reintroduces a bug
in iteratePipeline - typeSystemInit() is not called [1].

@Peter: could the missing call to typeSystemInit() be a problem for Ruta?

Cheers,

-- Richard

[1] https://issues.apache.org/jira/browse/UIMA-4998

> On 07.07.2016, at 09:17, Peter Kl├╝gl <peter.kluegl@averbis.com> wrote:
> 
> Hi,
> 
> 
> I have no idea yet why the code with iteratePipeline does not work.
> 
> 
> Richard, do you have an idea?
> 
> 
> Are there any exceptions? Do you use the rae objects somewhere? Is your
> code hosted somewhere, e.g., on github? What do you mean by your own
> annotations? Annotations of an external type system or annotations added
> by another engine or reader?
> 
> 
> Best,
> 
> 
> Peter
> 
> 
> Am 06.07.2016 um 02:41 schrieb Bonnie MacKellar:
>> I have a very lengthy Ruta script which annotates my files successfully. I
>> can see all the annotations in AnnotationBrowser and they are correct.
>> I want to get all the annotations in a Java program, so I can count
>> occurrences.  I am using uimaFit. I am getting very odd results.
>> 
>> When I use CasDumpWriter, I see all my annotations, correctly written to
>> the dump file. Here is the code that does this
>> -------------------------------------------------------------------------------------------------------
>> AnalysisEngineDescription rutaEngineDesc =
>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
>> RutaEngine.PARAM_MAIN_SCRIPT,
>>           "ecClassifier",
>>           RutaEngine.PARAM_SCRIPT_PATHS, new String[]
>> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"},
>>           RutaEngine.PARAM_DESCRIPTOR_PATHS,  new String[]
>> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated-sources/ruta/descriptor"},
>>           RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES,
>> "org.apache.uima.ruta.engine.PlainTextAnnotator");
>> AnalysisEngineDescription writerDesc =
>> AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
>> CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
>> AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
>> SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);
>> -----------------------------------------------------------------------------------------------------
>> 
>> However, when I try to do this myself, using iteratePipeline to iterate
>> through the JCas structures for each input file, many of the annotations
>> are missing. I have a suspicion that the missing annotations are ones that
>> annotate text for which there is another annotation.   For example, text
>> will be annotated with Line, and with my own annotation. My code to print
>> the annotations is based on the code in CasDumpWriter.
>> 
>> -----------------------------------------------------------------------------------------------------
>> 
>> for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
>> rutaEngineDesc)) {
>> displayRutaResults(jcas);
>> 
>> 
>> public void displayRutaResults(JCas jcas)
>> {
>> System.out.println("in display ruta results");
>> 
>>     FSIterator<Annotation> annotationIter =
>> jcas.getAnnotationIndex().iterator();
>>     while (annotationIter.hasNext())
>>     {
>>     AnnotationFS annotation = annotationIter.next();
>>     System.out.println(annotation.getType().getName());
>>     System.out.println(annotation.getCoveredText());
>> 
>>     System.out.println("------------------------------------------");
>>    //  System.out.println(annotation.toString());
>>     }
>> }
>> 
>> ------------------------------------------------------------------------------------------------
>> 
>> Why would this code produce different results than CasDumpWriter, which
>> uses almost exactly the same code?   Is it something to do with using
>> runPipeline vs iteratePipeline? Should I write my code so it can be placed
>> inside runPipeline?
>> 
>> thanks so much!
>> Bonnie MacKellar
>> 
> 


Mime
View raw message