uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Processing a List of Strings with UIMA Addons components
Date Sat, 24 Aug 2013 14:54:14 GMT

On 8/23/2013 11:11 AM, harshal patni wrote:
> Hello Marshall,
>                      Thank you for the suggestion! This works for us! As
> per your suggestion, we have now created an Aggregate Analysis Engine that
> contains CAS Multiplier (Splitter), our original aggregate engine and CAS
> Merger (to merge the results into one CAS at the end).
>
> But the final merged CAS contains the child CAS'es (created in the
> splitter) and the parent CAS as well. Is this expected? Any idea why?
This is under the control of the "flow controller" being used in the aggregate. 
If you haven't written your own (where you can explicitly control what happens),
then you're probably using one of the pre-built ones, whose behavior is
documented here:

http://uima.apache.org/d/uimaj-2.4.2/tutorials_and_users_guides.html#ugr.tug.cm.cm_and_fc

I've copied a bit of this below:


      7.3.2. CAS Multipliers and Flow Control

CAS Multipliers are only supported in the context of Fixed Flow or custom Flow
Control. If you use the built-in "Fixed Flow" for your Aggregate Analysis
Engine, you can position the CAS Multiplier anywhere in that flow. Processing
then works as follows: When a CAS is input to the Aggregate AE, that CAS is
routed to the components in the order specified by the Fixed Flow, until that
CAS reaches a CAS Multiplier.

Upon reaching a CAS Multiplier, if that CAS Multiplier produces new output
CASes, then each output CAS from that CAS Multiplier will continue through the
flow, starting at the node immediately after the CAS Multiplier in the Fixed
Flow. No further processing will be done on the original input CAS after it has
reached a CAS Multiplier -- it will /not/ continue in the flow.

If the CAS Multiplier does /not/ produce any output CASes for a given input CAS,
then that input CAS /will/ continue in the flow. This behavior is appropriate,
for example, for a CAS Multiplier that may segment an input CAS into pieces but
only does so if the input CAS is larger than a certain size.


---------

Does this help?

-Marshall

>
> We used CAS splitter and merger for a synchronous UIMA pipeline as well.
> That does not give us the parent CAS in the final result (Merged CAS). Why
> the difference?
>
> Harshal
>
>
>
>
>
> On Wed, Aug 7, 2013 at 6:20 AM, Marshall Schor <msa@schor.com> wrote:
>
>> On 8/6/2013 6:10 PM, Mathaeus Dejori wrote:
>>> Hi,
>>>
>>> I'd like to use UIMA AS to annotate a large list of text segments.
>> Instead
>>> of passing each text segment individually to the AnalysisEngine I'd like
>> to
>>> pass the entire list at once.
>>>
>>> As far as I understand I can use the cas.setSofaDataArray() to pass a
>> list
>>> of Strings and get back Annotations that refer to particular segments.
>>> However, in doing so I won't be able to use any of the existing
>> Annotators
>>> (e.g. Concept Mapper) as their process(cas, spec) function expects the
>>> cas.getDocumentText().
>>>
>>> Is there a design pattern for uima to consume a list of strings, pass
>>> individual elements to specific Annotators and combine all the results at
>>> the end?
>> If what you are trying to do is to take an input CAS which has a bunch of
>> "strings" and send each one thru a pipeline,  the normal UIMA design
>> pattern for
>> that is to use a CAS Multiplier at the start which gets as input the CAS
>> with
>> all the strings, and then puts each one into another CAS and send it
>> through the
>> pipeline.   If the combining you want to do is to combine all the results
>> into
>> another CAS, then you can use another CAS Multiplier at the end which
>> receives
>> the individual string CASes, and accumulates results until all the parts
>> are
>> done, and then outputs a "result" CAS with the combined result.
>>
>> See
>> http://uima.apache.org/d/uimaj-2.4.1/tutorials_and_users_guides.html#ugr.tug.cm
>>
>> -Marshall
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message