uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Some recent use-cases for managing large sets of mostly similar complex descriptors, that may be worthy of some UIMA core support
Date Tue, 10 Sep 2013 21:49:49 GMT
As UIMA gets more use in complex pipelines and investigations, I've heard of two
particular use cases which might warrant some new supporting UIMA features.

These use cases are both around the idea of running a variety of "experiments"
using a large pipeline made up of many components, where you want the bulk of
the pipeline to be stable, setup and configured correctly, and yet want to vary
parts of the pipeline. The granularity in these use-cases is around analysis
engines (either primitive ones or aggregates), and potentially skipping some, or
selecting among alternatives.

One kind of use-case for the alternative implementation may involve substituting
a "remote" descriptor for a "local" one, or may involve running a "faster" but
less accurate version of some aspect of the pipeline, versus running a
slower/more accurate one.

Likewise, skipping some parts of the pipeline might be done to see the value of
some kind of analysis.

Although the descriptors for these experiments could be "edited" to comment-out
/ comment-in the "skipped" analysis engines, or to select alternatives, the
other part of the use case that I've heard is that people who do this end up
with lots of variations of descriptors, and this becomes somewhat
unmaintainable.  They want to be able to "externalize" this kind of
configuration, to enhance the stability / maintenance of the main pipeline.

The idea would be to have a stable, maintainable representation of the "full"
pipeline, with all the alternatives, in one spot, and then to be able to use a
2nd resource (e.g., a configuration file of some kind) to "personalize" the full
pipeline, skipping some parts, and picking alternatives for some parts.  Since
UIMA already is supporting an External Override Configuration file mechanism,
making use of that would allow keeping all the "settings" for an experiment
together.

The ideal descriptor representation for this would make it very obvious to a
reader what was going on, with as little "indirection" or hiding as feasible. 
We would also want to think about the Eclipse Component Descriptor Editor (CDE)
support for this kind of thing.

Do others think these use cases (or ones like them) arise frequently in their
use of UIMA?

-Marshall

Mime
View raw message