uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mog <...@crydee.eu>
Subject Re: Filter Cas from UIMA fit pipeline
Date Fri, 21 Nov 2014 11:11:00 GMT
Hi Carsten, please see


for an example pipeline and


for an example filter.

This uses uimafit so you'll have to translate it in UIMA terms but it
might be a starting point.


On 11/21/2014 11:15 AM, Carsten Schnober wrote:
> Hi Sumit,
> Thanks for your suggestion, it seems like the proper way to go for my
> use case. However, I'm not too familiar with the UIMA internals, so
> could you point me to where or how I can set the dropCasOnException option?
> Thanks!
> Carsten
> Am 07.11.2014 um 10:19 schrieb Sumit Madan:
>> Hi Carsten,
>> I had this experience too that a flow controller is not easy to build.
>> But may be you can use a workaroud. You can put a new AE in-between
>> (BinaryCasReader and Segmenter). This AE would throw an exception when a
>> (J)Cas doesn't fit your rules. With the UIMA options dropCasOnException
>> and ActionOnMaxError, UIMA can drop those (J)Cases and go further with
>> the wanted ones.
>> Regards
>>   Sumit
>> On 07/11/14 09:04, Armin.Wegner@bka.bund.de wrote:
>>> Hi Carsten,
>>> I've never used it, but according to the documentation you can do this
>>> with a  flow controller. The bad thing is, Richard told me a while ago
>>> that it is not so easy to build your own flow controller.
>>> Cheers,
>>> Armin
>>> -----Urspr√ľngliche Nachricht-----
>>> Von: Carsten Schnober [mailto:schnober@ukp.informatik.tu-darmstadt.de]
>>> Gesendet: Donnerstag, 6. November 2014 14:55
>>> An: user@uima.apache.org
>>> Betreff: Filter Cas from UIMA fit pipeline
>>> Hi,
>>> I wonder whether there is a recommended way to remove certain (J)Cas'
>>> (i.e. documents) from a pipeline after reading.
>>> The scenario in my case is that I use a standard reader
>>> (BinaryCasReader) which returns many documents. I only want a subset of
>>> these documents to be processed by the following pipeline (comprising a
>>> segmenter, a writer and some other engines), subject to a certain value
>>> in a custom annotation.
>>> The initial intuition would be to use/implement a reader that only
>>> selects those documents that fulfil the given condition. In my case that
>>> would mean, however, that I'd need to implement a new Reader extending
>>> the BinaryCasReader by the described functionality. From a high-level
>>> view at least, this seems much more complicated than just removing
>>> documents from the pipeline.
>>> Can I avoid that effort somehow without breaking conventions?
>>> Thanks!
>>> Carsten

View raw message