uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burn Lewis <burnle...@gmail.com>
Subject Re: Processing a List of Strings with UIMA Addons components
Date Fri, 30 Aug 2013 18:53:46 GMT
With a custom flow controller you can avoid the need for a CasMultiplier as
the final component ... it could be just an annotator that accumulates the
results from each of the child CASes and put them in the input CAS when it
arrives, and the flow controller could be designed to send the input CAS
straight to the final component. So in a 3-component aggregate of CM+AE+CC
the input CAS would skip the AE and the child CASes would be dropped after
the AE+CC so only the filled-out input CAS would exit.

~Burn


On Sat, Aug 24, 2013 at 10:54 AM, Marshall Schor <msa@schor.com> wrote:

>
> On 8/23/2013 11:11 AM, harshal patni wrote:
> > Hello Marshall,
> >                      Thank you for the suggestion! This works for us! As
> > per your suggestion, we have now created an Aggregate Analysis Engine
> that
> > contains CAS Multiplier (Splitter), our original aggregate engine and CAS
> > Merger (to merge the results into one CAS at the end).
> >
> > But the final merged CAS contains the child CAS'es (created in the
> > splitter) and the parent CAS as well. Is this expected? Any idea why?
> This is under the control of the "flow controller" being used in the
> aggregate.
> If you haven't written your own (where you can explicitly control what
> happens),
> then you're probably using one of the pre-built ones, whose behavior is
> documented here:
>
>
> http://uima.apache.org/d/uimaj-2.4.2/tutorials_and_users_guides.html#ugr.tug.cm.cm_and_fc
>
> I've copied a bit of this below:
>
>
>       7.3.2. CAS Multipliers and Flow Control
>
> CAS Multipliers are only supported in the context of Fixed Flow or custom
> Flow
> Control. If you use the built-in "Fixed Flow" for your Aggregate Analysis
> Engine, you can position the CAS Multiplier anywhere in that flow.
> Processing
> then works as follows: When a CAS is input to the Aggregate AE, that CAS is
> routed to the components in the order specified by the Fixed Flow, until
> that
> CAS reaches a CAS Multiplier.
>
> Upon reaching a CAS Multiplier, if that CAS Multiplier produces new output
> CASes, then each output CAS from that CAS Multiplier will continue through
> the
> flow, starting at the node immediately after the CAS Multiplier in the
> Fixed
> Flow. No further processing will be done on the original input CAS after
> it has
> reached a CAS Multiplier -- it will /not/ continue in the flow.
>
> If the CAS Multiplier does /not/ produce any output CASes for a given
> input CAS,
> then that input CAS /will/ continue in the flow. This behavior is
> appropriate,
> for example, for a CAS Multiplier that may segment an input CAS into
> pieces but
> only does so if the input CAS is larger than a certain size.
>
>
> ---------
>
> Does this help?
>
> -Marshall
>
> >
> > We used CAS splitter and merger for a synchronous UIMA pipeline as well.
> > That does not give us the parent CAS in the final result (Merged CAS).
> Why
> > the difference?
> >
> > Harshal
> >
> >
> >
> >
> >
> > On Wed, Aug 7, 2013 at 6:20 AM, Marshall Schor <msa@schor.com> wrote:
> >
> >> On 8/6/2013 6:10 PM, Mathaeus Dejori wrote:
> >>> Hi,
> >>>
> >>> I'd like to use UIMA AS to annotate a large list of text segments.
> >> Instead
> >>> of passing each text segment individually to the AnalysisEngine I'd
> like
> >> to
> >>> pass the entire list at once.
> >>>
> >>> As far as I understand I can use the cas.setSofaDataArray() to pass a
> >> list
> >>> of Strings and get back Annotations that refer to particular segments.
> >>> However, in doing so I won't be able to use any of the existing
> >> Annotators
> >>> (e.g. Concept Mapper) as their process(cas, spec) function expects the
> >>> cas.getDocumentText().
> >>>
> >>> Is there a design pattern for uima to consume a list of strings, pass
> >>> individual elements to specific Annotators and combine all the results
> at
> >>> the end?
> >> If what you are trying to do is to take an input CAS which has a bunch
> of
> >> "strings" and send each one thru a pipeline,  the normal UIMA design
> >> pattern for
> >> that is to use a CAS Multiplier at the start which gets as input the CAS
> >> with
> >> all the strings, and then puts each one into another CAS and send it
> >> through the
> >> pipeline.   If the combining you want to do is to combine all the
> results
> >> into
> >> another CAS, then you can use another CAS Multiplier at the end which
> >> receives
> >> the individual string CASes, and accumulates results until all the parts
> >> are
> >> done, and then outputs a "result" CAS with the combined result.
> >>
> >> See
> >>
> http://uima.apache.org/d/uimaj-2.4.1/tutorials_and_users_guides.html#ugr.tug.cm
> >>
> >> -Marshall
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message