ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject Re: suggestion for default pipelines
Date Mon, 28 Apr 2014 13:23:37 GMT
Yes. I was thinking of the use case for example- the ytex component need SentenceDectectorA
but dictionary lookup component expects SentenceDectectorB. It's probably not too common but
something to consider with the cool dynamic/plugin n play pipelines idea. 

Sent from my iPhone

> On Apr 28, 2014, at 5:46 AM, "Richard Eckart de Castilho" <rec@apache.org> wrote:
> 
> At the time a factory method becomes callable, the Maven/Ivy-magic should already have
taken place, no?
> 
> -- Richard
> 
>> On 27.04.2014, at 17:52, Chen, Pei <Pei.Chen@childrens.harvard.edu> wrote:
>> 
>> My vote would be for the latter. Have the "Factory" create pipelines instead. It
could just be a naming thing though...
>> 
>> +1 for building dynamic pipelines. I think this idea has been thrown around for sometime,
but it hasn't been really worked on so it would be cool to see it in action. I think the tricky
part is handling pipeline dependencies- ie. Similar concept to Maven/Ivy. 
>> 
>> Sent from my iPhone
>> 
>>> On Apr 24, 2014, at 5:48 PM, "Miller, Timothy" <Timothy.Miller@childrens.harvard.edu>
wrote:
>>> 
>>> Any preference for separate factory classes:
>>> 
>>> class SentenceDetectorAnnotatorFactory:
>>> 
>>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>>> 
>>> VS
>>> 
>>> static methods added to primitive annotators:
>>> 
>>> class SentenceDetector (existing)
>>> 
>>> static AnalysisEngineDescription getSentenceDetectorAnnotator()
>>> 
>>> ?
>>> 
>>> The former can clutter up the class space while the latter extends the
>>> length of classes, especially if there are multiple versions
>>> (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
>>> getMeshDictionaryAnnotator(), etc.)
>>> 
>>> Tim
>>> 
>>>> On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
>>>> It would be nice if uimaFIT provided a Maven plugin to automatically
>>>> generate descriptors for aggregates. Maybe if we come up with a 
>>>> convention for factories, e.g. a "class with static methods that do
>>>> not take any parameters and that return descriptors", or "methods
>>>> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
>>>> it should be possible to implement such a Maven plugin.
>>>> 
>>>> Cheers,
>>>> 
>>>> -- Richard
>>>> 
>>>>> On 16.04.2014, at 05:21, Steven Bethard <steven.bethard@gmail.com>
wrote:
>>>>> 
>>>>> +1. And note that once you have a descriptor, you can generate the
>>>>> XML, so we should arrange to replace the current XML descriptors with
>>>>> ones generated automatically from the uimaFIT code. That should reduce
>>>>> some synchronization problems when the Java code was changed but the
>>>>> XML descriptor was not.
>>>>> 
>>>>> Steve
>>>>> 
>>>>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>>>>> <Timothy.Miller@childrens.harvard.edu> wrote:
>>>>>> The discussion in the other thread with Abraham Tom gave me an idea
I
>>>>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>>>>> builders in the temporal project that maybe could be moved into
>>>>>> clinical-pipeline. For example, look to this file:
>>>>>> 
>>>>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>>>>> 
>>>>>> with the static methods getPreprocessorAggregateBuilder() and
>>>>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>>>>> 
>>>>>> So my idea would be to create a class in clinical-pipeline
>>>>>> (CTakesPipelines) with static methods for some standard pipelines
(to
>>>>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>>>>> 
>>>>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>>>>> AggregatePlaintextUMLSProcessor.xml
>>>>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>>>>> etc., every component in ctakes
>>>>>> 
>>>>>> We could then potentially merge our entry points -- I think Abraham's
>>>>>> experience points out that this is currently confusing, as well as
>>>>>> probably not implemented optimally. For example, either
>>>>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>>>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>>>>> our xml descriptors too unless people feel strongly about keeping
those
>>>>>> around.
>>>>>> 
>>>>>> Another benefit is that the cTAKES API is then trivial -- if you
import
>>>>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit
call:
>>>>>> 
>>>>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>>>>> 
>>>>>> 
>>>>>> I think this would actually be pretty easy to implement, but hoping
to
>>>>>> get some feedback on whether this is a good direction.
>>>>>> 
>>>>>> Tim
>>> 
>>> -- 
>>> Tim Miller
>>> Instructor
>>> Boston Children's Hospital and Harvard Medical School
>>> timothy.miller@childrens.harvard.edu
>>> 617-919-1223
> 

Mime
View raw message