uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eddie Epstein <eaepst...@gmail.com>
Subject Re: CAS Multipliers and Pipeline Troubles
Date Mon, 22 Jun 2009 12:56:23 GMT
Hi,

To confirm your configuration, an aggregate AE contains two Cas
multipliers, one to split and a second to merge large documents. This
AAE is itself contained in a larger aggregate. The desired behavior is
that the inner AAE should return the merged CASes, but not the split
CASes. Of course the original input CASes must also be returned.

For this scenario the inner AAE itself must be declared to be a Cas
multiplier, because the intent is for it to return new CASes. The
inner AAE's flow controller will have to specify that split CASes be
dropped using "new FinalStep(true)", but merged CASes be returned via
"new FinalStep(false)" or just "new FinalStep()".

Is this the intended configuration?

Regards,
Eddie


On Mon, Jun 22, 2009 at 8:06 AM, Valkyrie
Savage<savage@tk.informatik.tu-darmstadt.de> wrote:
>
> Hello, all,
>
> I'm working on a project involving UIMA, and I've run into some difficulties that I can't
figure out.  This is my first month working with UIMA, so I am admittedly not well-versed
in all its components and interactions, but I'll try to describe my problem as best I can.
 I'm running UIMA 2.2.2-incubating with Java 1.6 inside of Eclipse Ganymede.
>
> The project involves processing rather large documents, and the in-house components that
I'm using have difficulty reading in a book-length chunk of text at a time.  For this reason,
I've developed a very simple CAS multiplier; it takes in a CAS that contains Segment annotations
and generates a new CAS for each Segment.  This multiplier is contained in an aggregate AE,
and the other components of the AE are used for adding a few new annotations.  At the end
of the aggregate is a simple CAS demultiplier; it is based heavily on the example in org.apache.uima.examples.casMultiplier,
except that I hardcoded the tags that I want to copy across the demultiplying.
>
> The problem that I am coming across is that the split CASes are being tagged correctly
and merged correctly, but for whatever reason the merged CAS is not the one that is being
sent on through the rest of the pipeline after this aggregate AE.  I have a simple CAS printer
running at the end of the next() function of my demultiplier that shows that only the tags
that I wanted are being retained after the merge, but they appear again if I add an AnnotationWriter
in the next step of the pipeline.  I read about Flow Controllers, and it seems that the original
CAS should be dropped from the pipeline by default, since new CASes are being created from
it (I am not using any kind of user-defined Flow Controller), but that doesn't seem to be
happening.  None of the new tags added in the Aggregate AE are being preserved, but all the
tags that are supposed to be stripped out are being preserved.
>
> If there's more information needed, I'll be happy to provide it.  As I mentioned, I'm
new to UIMA, and I'm not sure how to go about trying to debug this.
>
> Thank you!
>
> Valkyrie Savage
>

Mime
View raw message