uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: AW: Processing a List of Strings with UIMA Addons components
Date Wed, 07 Aug 2013 17:22:58 GMT

On 8/7/2013 2:33 AM, Armin.Wegner@bka.bund.de wrote:
> Dear Marshall,
> Consider an input text from which only some parts should be processed. After
processing the text should be there in one piece again. Let A denote parts of no
interest and let b denote parts to analyse further. XAX is split up into X, A,
and X. There is nothing to do for the X segments. A has to be put into the
pipeline. I only know how to use the CAS Multiplier if every segment has to be
processed. But in this case some segments have to be left out. Is there a way to
bypass the pipeline for the X segments? How to do the splitting and combining?
> Cheers,
> Armin
There are lots of ways to do this.  If the splitter annotator is written as a
CAS Multiplier and splits things up into X, A, and X, and sends these along, a
custom flow controller could look at an "extra" bit of control info the splitter
puts into the CAS which would act as a flag to the flow controller to either
route the CAS  through the processing pipeline, or bypass it.

After all split-out parts are done, the splitter annotator could send a "final"
CAS which would have a flag that the flow controller would use to bypass
processing, but that same flag would serve to signal the "recombiner" annotator
that the parts processing was finished, and it should recombine things.


Another way: Inside the splitter annotator, you can instantiate a brand-new,
completely independent UIMA pipeline.  Then within that annotator, it can do all
the work of splitting, sending something through that sub-pipeline, and
retrieving the results back into the original CAS in whatever way makes sense.

Because the sub-pipeline is independent, it can even have a different type
system.  You would write whatever transformation / copying code is needed
(there's a CasCopier class that can help to copy things between CASes.).

HTH. -Marshall
> -----Urspr√ľngliche Nachricht-----
> Von: Marshall Schor [mailto:msa@schor.com]
> Gesendet: Mittwoch, 7. August 2013 02:51
> An: user@uima.apache.org
> Betreff: Re: Processing a List of Strings with UIMA Addons components
> On 8/6/2013 6:10 PM, Mathaeus Dejori wrote:
>> Hi,
>> I'd like to use UIMA AS to annotate a large list of text segments.
>> Instead of passing each text segment individually to the
>> AnalysisEngine I'd like to pass the entire list at once.
>> As far as I understand I can use the cas.setSofaDataArray() to pass a
>> list of Strings and get back Annotations that refer to particular segments.
>> However, in doing so I won't be able to use any of the existing
>> Annotators (e.g. Concept Mapper) as their process(cas, spec) function
>> expects the cas.getDocumentText().
>> Is there a design pattern for uima to consume a list of strings, pass
>> individual elements to specific Annotators and combine all the results
>> at the end?
> If what you are trying to do is to take an input CAS which has a bunch of
"strings" and send each one thru a pipeline,  the normal UIMA design pattern for
that is to use a CAS Multiplier at the start which gets as input the CAS with
all the strings, and then puts each one into another CAS and send it through the
> pipeline.   If the combining you want to do is to combine all the results into
> another CAS, then you can use another CAS Multiplier at the end which receives
the individual string CASes, and accumulates results until all the parts are
done, and then outputs a "result" CAS with the combined result.
> See
> -Marshall

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message