uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Armin.Weg...@bka.bund.de>
Subject AW: Processing a List of Strings with UIMA Addons components
Date Wed, 07 Aug 2013 06:33:49 GMT
Dear Marshall,

Consider an input text from which only some parts should be processed. After processing the
text should be there in one piece again. Let A denote parts of no interest and let b denote
parts to analyse further. XAX is split up into X, A, and X. There is nothing to do for the
X segments. A has to be put into the pipeline. I only know how to use the CAS Multiplier if
every segment has to be processed. But in this case some segments have to be left out. Is
there a way to bypass the pipeline for the X segments? How to do the splitting and combining?

Cheers,
Armin


-----Urspr√ľngliche Nachricht-----
Von: Marshall Schor [mailto:msa@schor.com] 
Gesendet: Mittwoch, 7. August 2013 02:51
An: user@uima.apache.org
Betreff: Re: Processing a List of Strings with UIMA Addons components


On 8/6/2013 6:10 PM, Mathaeus Dejori wrote:
> Hi,
>
> I'd like to use UIMA AS to annotate a large list of text segments. 
> Instead of passing each text segment individually to the 
> AnalysisEngine I'd like to pass the entire list at once.
>
> As far as I understand I can use the cas.setSofaDataArray() to pass a 
> list of Strings and get back Annotations that refer to particular segments.
> However, in doing so I won't be able to use any of the existing 
> Annotators (e.g. Concept Mapper) as their process(cas, spec) function 
> expects the cas.getDocumentText().
>
> Is there a design pattern for uima to consume a list of strings, pass 
> individual elements to specific Annotators and combine all the results 
> at the end?
If what you are trying to do is to take an input CAS which has a bunch of "strings" and send
each one thru a pipeline,  the normal UIMA design pattern for that is to use a CAS Multiplier
at the start which gets as input the CAS with all the strings, and then puts each one into
another CAS and send it through the
pipeline.   If the combining you want to do is to combine all the results into
another CAS, then you can use another CAS Multiplier at the end which receives the individual
string CASes, and accumulates results until all the parts are done, and then outputs a "result"
CAS with the combined result.

See http://uima.apache.org/d/uimaj-2.4.1/tutorials_and_users_guides.html#ugr.tug.cm

-Marshall

Mime
View raw message