ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Finan, Sean" <Sean.Fi...@childrens.harvard.edu>
Subject RE: suggestion for default pipelines
Date Tue, 15 Apr 2014 14:05:25 GMT
+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is the way to go
until a replacement comes along.  

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:


with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:


I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.


View raw message