ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: suggestion for default pipelines
Date Thu, 24 Apr 2014 21:41:28 GMT
Any preference for separate factory classes:

class SentenceDetectorAnnotatorFactory:

static AnalysisEngineDescription getSentenceDetectorAnnotator()

VS

static methods added to primitive annotators:

class SentenceDetector (existing)

static AnalysisEngineDescription getSentenceDetectorAnnotator()

?

The former can clutter up the class space while the latter extends the
length of classes, especially if there are multiple versions
(getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
getMeshDictionaryAnnotator(), etc.)

Tim

On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
> It would be nice if uimaFIT provided a Maven plugin to automatically
> generate descriptors for aggregates. Maybe if we come up with a 
> convention for factories, e.g. a "class with static methods that do
> not take any parameters and that return descriptors", or "methods
> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
> it should be possible to implement such a Maven plugin.
>
> Cheers,
>
> -- Richard
>
> On 16.04.2014, at 05:21, Steven Bethard <steven.bethard@gmail.com> wrote:
>
>> +1. And note that once you have a descriptor, you can generate the
>> XML, so we should arrange to replace the current XML descriptors with
>> ones generated automatically from the uimaFIT code. That should reduce
>> some synchronization problems when the Java code was changed but the
>> XML descriptor was not.
>>
>> Steve
>>
>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>> <Timothy.Miller@childrens.harvard.edu> wrote:
>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>> builders in the temporal project that maybe could be moved into
>>> clinical-pipeline. For example, look to this file:
>>>
>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>
>>> with the static methods getPreprocessorAggregateBuilder() and
>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>
>>> So my idea would be to create a class in clinical-pipeline
>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>
>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>> AggregatePlaintextUMLSProcessor.xml
>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>> etc., every component in ctakes
>>>
>>> We could then potentially merge our entry points -- I think Abraham's
>>> experience points out that this is currently confusing, as well as
>>> probably not implemented optimally. For example, either
>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>> our xml descriptors too unless people feel strongly about keeping those
>>> around.
>>>
>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>
>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>
>>>
>>> I think this would actually be pretty easy to implement, but hoping to
>>> get some feedback on whether this is a good direction.
>>>
>>> Tim
>

-- 
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy.miller@childrens.harvard.edu
617-919-1223


Mime
View raw message