Hi,
I fixed the problem by taking the same code and putting it in a class that
extends CasConsumer_ImplBase. This method is called from the class's
process method
private void processFeatureStructures(CAS aCAS) {
Iterator<AnnotationFS> annotationIterator =
aCAS.getAnnotationIndex().iterator();
while (annotationIterator.hasNext()) {
AnnotationFS annotation = annotationIterator.next();
try {
out.println("[" + annotation.getCoveredText() + "]");
Type type = annotation.getType();
String typeName = type.getName();
out.println("found annotation type " + typeName );
Integer count = typeCounts.get(typeName);
out.println("current count is " + count);
etc.
I then use this code
AnalysisEngineDescription myWriterDesc =
AnalysisEngineFactory.createEngineDescription(CountWriter.class,
CountWriter.PARAM_OUTPUT_FILE, "myOutput.txt");
SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, myWriterDesc);
to run it. And it works perfectly. All my annotations show up. Which is
great, because I now have my totals that I needed. But it isn't great,
because I cannot explain this. That worries me because it is hard to
develop complex systems when you don't understand the underlying model.
Also, this method does not work either, and I believe that it should
AnalysisEngine rae = AnalysisEngineFactory.createEngine(rutaEngineDesc);
JCas jCas = rae.newJCas();
jCas.setDocumentText(someText);
rae.process(jCas);
displayRutaResults(jCas);
In this case, the jCas contents were incorrect in the same way that they
were when I used
for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
rutaEngineDesc)) {
displayRutaResults(jcas);
Only if I put the code in a class that is used in runPipeline do I get the
correct results. If you have any ideas about this, I would love to know.
In any case, I almost have all the results I need. I have one more task - I
need to figure out how to obtain text that was annotated with one
particular annotation and NO OTHER. I am thinking that selectCovered might
do what I need, but I can't find many examples of how to use it.
thanks for your help,
Bonnie MacKellar
On Thu, Jul 7, 2016 at 4:48 AM, Peter Klügl <peter.kluegl@averbis.com>
wrote:
> Nope, that should not be a problem since the types are initialized in
> process()
>
>
> Am 07.07.2016 um 10:26 schrieb Richard Eckart de Castilho:
> > Hi,
> >
> > iteratePipeline and runPipeline should be mostly equivalent.
> > A difference occurs if you e.g. have a CAS multiplier within
> > an aggregate engine.
> >
> > runPipeline delegates the execution to the UIMA core and is able
> > to handle CAS multipliers.
> >
> > iteratePipeline (re)uses a single CAS instance which is passed
> > to the reader and all analysis engines in turn. It does not
> > support CAS multipliers.
> >
> > A user recently pointed out that uimaFIT 2.2.0 reintroduces a bug
> > in iteratePipeline - typeSystemInit() is not called [1].
> >
> > @Peter: could the missing call to typeSystemInit() be a problem for Ruta?
> >
> > Cheers,
> >
> > -- Richard
> >
> > [1] https://issues.apache.org/jira/browse/UIMA-4998
> >
> >> On 07.07.2016, at 09:17, Peter Klügl <peter.kluegl@averbis.com> wrote:
> >>
> >> Hi,
> >>
> >>
> >> I have no idea yet why the code with iteratePipeline does not work.
> >>
> >>
> >> Richard, do you have an idea?
> >>
> >>
> >> Are there any exceptions? Do you use the rae objects somewhere? Is your
> >> code hosted somewhere, e.g., on github? What do you mean by your own
> >> annotations? Annotations of an external type system or annotations added
> >> by another engine or reader?
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >>
> >> Am 06.07.2016 um 02:41 schrieb Bonnie MacKellar:
> >>> I have a very lengthy Ruta script which annotates my files
> successfully. I
> >>> can see all the annotations in AnnotationBrowser and they are correct.
> >>> I want to get all the annotations in a Java program, so I can count
> >>> occurrences. I am using uimaFit. I am getting very odd results.
> >>>
> >>> When I use CasDumpWriter, I see all my annotations, correctly written
> to
> >>> the dump file. Here is the code that does this
> >>>
> -------------------------------------------------------------------------------------------------------
> >>> AnalysisEngineDescription rutaEngineDesc =
> >>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> >>> RutaEngine.PARAM_MAIN_SCRIPT,
> >>> "ecClassifier",
> >>> RutaEngine.PARAM_SCRIPT_PATHS, new String[]
> >>>
> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/src/main/ruta"},
> >>> RutaEngine.PARAM_DESCRIPTOR_PATHS, new String[]
> >>>
> {"/home/bonnie/Research/eclipse-uima-projects/counttypes/target/generated-sources/ruta/descriptor"},
> >>> RutaEngine.PARAM_ADDITIONAL_UIMAFIT_ENGINES,
> >>> "org.apache.uima.ruta.engine.PlainTextAnnotator");
> >>> AnalysisEngineDescription writerDesc =
> >>> AnalysisEngineFactory.createEngineDescription(CasDumpWriter.class,
> >>> CasDumpWriter.PARAM_OUTPUT_FILE, "dump2.txt");
> >>> AnalysisEngine rae =
> AnalysisEngineFactory.createEngine(rutaEngineDesc);
> >>> SimplePipeline.runPipeline(readerDesc, rutaEngineDesc, writerDesc);
> >>>
> -----------------------------------------------------------------------------------------------------
> >>>
> >>> However, when I try to do this myself, using iteratePipeline to iterate
> >>> through the JCas structures for each input file, many of the
> annotations
> >>> are missing. I have a suspicion that the missing annotations are ones
> that
> >>> annotate text for which there is another annotation. For example,
> text
> >>> will be annotated with Line, and with my own annotation. My code to
> print
> >>> the annotations is based on the code in CasDumpWriter.
> >>>
> >>>
> -----------------------------------------------------------------------------------------------------
> >>>
> >>> for (JCas jcas : SimplePipeline.iteratePipeline(readerDesc,
> >>> rutaEngineDesc)) {
> >>> displayRutaResults(jcas);
> >>>
> >>>
> >>> public void displayRutaResults(JCas jcas)
> >>> {
> >>> System.out.println("in display ruta results");
> >>>
> >>> FSIterator<Annotation> annotationIter =
> >>> jcas.getAnnotationIndex().iterator();
> >>> while (annotationIter.hasNext())
> >>> {
> >>> AnnotationFS annotation = annotationIter.next();
> >>> System.out.println(annotation.getType().getName());
> >>> System.out.println(annotation.getCoveredText());
> >>>
> >>> System.out.println("------------------------------------------");
> >>> // System.out.println(annotation.toString());
> >>> }
> >>> }
> >>>
> >>>
> ------------------------------------------------------------------------------------------------
> >>>
> >>> Why would this code produce different results than CasDumpWriter, which
> >>> uses almost exactly the same code? Is it something to do with using
> >>> runPipeline vs iteratePipeline? Should I write my code so it can be
> placed
> >>> inside runPipeline?
> >>>
> >>> thanks so much!
> >>> Bonnie MacKellar
> >>>
>
>
|