uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Some recent use-cases for managing large sets of mostly similar complex descriptors, that may be worthy of some UIMA core support
Date Tue, 10 Sep 2013 22:00:22 GMT
Sounds pretty much like one our motivations to build uimaFIT, just
that we didn't want to juggle around with descriptors and instead just
generate them dynamically and assemble them into pipelines dynamically,
just as required.

DKPro Lab [1] builds up on these features and adds parameter sweeping
experiment support, allowing to dynamically assemble UIMA pipelines
based on the current experiment parameters. Here, we didn't use UIMA
flow controllers, because we wanted the framework to be able to integrate
UIMA steps and non-UIMA steps in the same experimental setup.

-- Richard

[1] http://code.google.com/p/dkpro-lab/

On 10.09.2013, at 23:49, Marshall Schor <msa@schor.com> wrote:

> As UIMA gets more use in complex pipelines and investigations, I've heard of two
> particular use cases which might warrant some new supporting UIMA features.
> These use cases are both around the idea of running a variety of "experiments"
> using a large pipeline made up of many components, where you want the bulk of
> the pipeline to be stable, setup and configured correctly, and yet want to vary
> parts of the pipeline. The granularity in these use-cases is around analysis
> engines (either primitive ones or aggregates), and potentially skipping some, or
> selecting among alternatives.
> One kind of use-case for the alternative implementation may involve substituting
> a "remote" descriptor for a "local" one, or may involve running a "faster" but
> less accurate version of some aspect of the pipeline, versus running a
> slower/more accurate one.
> Likewise, skipping some parts of the pipeline might be done to see the value of
> some kind of analysis.
> Although the descriptors for these experiments could be "edited" to comment-out
> / comment-in the "skipped" analysis engines, or to select alternatives, the
> other part of the use case that I've heard is that people who do this end up
> with lots of variations of descriptors, and this becomes somewhat
> unmaintainable.  They want to be able to "externalize" this kind of
> configuration, to enhance the stability / maintenance of the main pipeline.
> The idea would be to have a stable, maintainable representation of the "full"
> pipeline, with all the alternatives, in one spot, and then to be able to use a
> 2nd resource (e.g., a configuration file of some kind) to "personalize" the full
> pipeline, skipping some parts, and picking alternatives for some parts.  Since
> UIMA already is supporting an External Override Configuration file mechanism,
> making use of that would allow keeping all the "settings" for an experiment
> together.
> The ideal descriptor representation for this would make it very obvious to a
> reader what was going on, with as little "indirection" or hiding as feasible. 
> We would also want to think about the Eclipse Component Descriptor Editor (CDE)
> support for this kind of thing.
> Do others think these use cases (or ones like them) arise frequently in their
> use of UIMA?
> -Marshall

View raw message