ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: suggestion for default pipelines
Date Sun, 27 Apr 2014 08:46:41 GMT
There is already code scanning for annotations in the "generate" goal of the
uimafit-maven-plugin. It may not be much effort to extend that to scan for
and invoke such methods. 

-- Richard

On 27.04.2014, at 10:39, Richard Eckart de Castilho <rec@apache.org> wrote:

> Maybe that choice should be left to the user. Factory methods could be marked using
> a special Java annotation that is scanned for at build time, e.g. something along the
> lines of
> 
> @DescriptionGenerator
> static AnalysisEngineDescription getSentenceDetectorAnnotator()
> 
> -- Richard
> 
> On 24.04.2014, at 23:41, Miller, Timothy <Timothy.Miller@childrens.harvard.edu>
wrote:
> 
>> Any preference for separate factory classes:
>> 
>> class SentenceDetectorAnnotatorFactory:
>> 
>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>> 
>> VS
>> 
>> static methods added to primitive annotators:
>> 
>> class SentenceDetector (existing)
>> 
>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>> 
>> ?
>> 
>> The former can clutter up the class space while the latter extends the
>> length of classes, especially if there are multiple versions
>> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
>> getMeshDictionaryAnnotator(), etc.)
>> 
>> Tim
>> 
>> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>>> It would be nice if uimaFIT provided a Maven plugin to automatically
>>> generate descriptors for aggregates. Maybe if we come up with a 
>>> convention for factories, e.g. a "class with static methods that do
>>> not take any parameters and that return descriptors", or "methods
>>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>>> it should be possible to implement such a Maven plugin.
>>> 
>>> Cheers,
>>> 
>>> -- Richard
>>> 
>>> On 16.04.2014, at 05:21, Steven Bethard <steven.bethard@gmail.com> wrote:
>>> 
>>>> +1. And note that once you have a descriptor, you can generate the
>>>> XML, so we should arrange to replace the current XML descriptors with
>>>> ones generated automatically from the uimaFIT code. That should reduce
>>>> some synchronization problems when the Java code was changed but the
>>>> XML descriptor was not.
>>>> 
>>>> Steve
>>>> 
>>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>>> <Timothy.Miller@childrens.harvard.edu> wrote:
>>>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>>> builders in the temporal project that maybe could be moved into
>>>>> clinical-pipeline. For example, look to this file:
>>>>> 
>>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>>> 
>>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>>> 
>>>>> So my idea would be to create a class in clinical-pipeline
>>>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>>> 
>>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>>> AggregatePlaintextUMLSProcessor.xml
>>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>>> etc., every component in ctakes
>>>>> 
>>>>> We could then potentially merge our entry points -- I think Abraham's
>>>>> experience points out that this is currently confusing, as well as
>>>>> probably not implemented optimally. For example, either
>>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>>> our xml descriptors too unless people feel strongly about keeping those
>>>>> around.
>>>>> 
>>>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>>>> 
>>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>>> 
>>>>> 
>>>>> I think this would actually be pretty easy to implement, but hoping to
>>>>> get some feedback on whether this is a good direction.
>>>>> 
>>>>> Tim
>>> 
>> 
>> -- 
>> Tim Miller
>> Instructor
>> Boston Children's Hospital and Harvard Medical School
>> timothy.miller@childrens.harvard.edu
>> 617-919-1223
>> 
> 


Mime
View raw message