ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: suggestion for default pipelines
Date Wed, 16 Apr 2014 08:35:51 GMT
It would be nice if uimaFIT provided a Maven plugin to automatically
generate descriptors for aggregates. Maybe if we come up with a 
convention for factories, e.g. a "class with static methods that do
not take any parameters and that return descriptors", or "methods
that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
it should be possible to implement such a Maven plugin.


-- Richard

On 16.04.2014, at 05:21, Steven Bethard <steven.bethard@gmail.com> wrote:

> +1. And note that once you have a descriptor, you can generate the
> XML, so we should arrange to replace the current XML descriptors with
> ones generated automatically from the uimaFIT code. That should reduce
> some synchronization problems when the Java code was changed but the
> XML descriptor was not.
> Steve
> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
> <Timothy.Miller@childrens.harvard.edu> wrote:
>> The discussion in the other thread with Abraham Tom gave me an idea I
>> wanted to float to the list. We have been using some UIMAFit pipeline
>> builders in the temporal project that maybe could be moved into
>> clinical-pipeline. For example, look to this file:
>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>> with the static methods getPreprocessorAggregateBuilder() and
>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>> So my idea would be to create a class in clinical-pipeline
>> (CTakesPipelines) with static methods for some standard pipelines (to
>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>> getStandardUMLSPipeline()  -- builds pipeline currently in
>> AggregatePlaintextUMLSProcessor.xml
>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>> etc., every component in ctakes
>> We could then potentially merge our entry points -- I think Abraham's
>> experience points out that this is currently confusing, as well as
>> probably not implemented optimally. For example, either
>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>> our xml descriptors too unless people feel strongly about keeping those
>> around.
>> Another benefit is that the cTAKES API is then trivial -- if you import
>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>> I think this would actually be pretty easy to implement, but hoping to
>> get some feedback on whether this is a good direction.
>> Tim

View raw message