ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Tom <a...@practicefusion.com>
Subject RE: suggestion for default pipelines
Date Tue, 15 Apr 2014 17:08:41 GMT
+1



Best regards,

Abraham Tom
____________________________
Abraham Tom
Data Warehouse Engineer
415.757.4674 (p) | 415.356.0950 (f)
atom@practicefusion.com
http://www.practicefusion.com
www.facebook.com/practicefusion

The contents of this message, together with any attachments, are intended only for the use
of the individual or entity to which they are addressed and may contain information that is
legally privileged, confidential and exempt from disclosure. If you are not the intended recipient,
you are hereby notified that any dissemination, distribution, or copying of this message,
or any attachment, is strictly prohibited. If you have received this message in error, please
notify the original sender or contact Practice Fusion at 415.346.7700 ext 4 immediately by
telephone or by return E-mail and delete this message, along with any attachments, from your
computer. Thank you


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 7:05 AM
To: dev@ctakes.apache.org
Subject: RE: suggestion for default pipelines

+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is the way to go
until a replacement comes along.  



-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu]
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I wanted to float to the
list. We have been using some UIMAFit pipeline builders in the temporal project that maybe
could be moved into clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to return AnalysisEngineDescriptions
instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing, etc., every component
in ctakes

We could then potentially merge our entry points -- I think Abraham's experience points out
that this is currently confusing, as well as probably not implemented optimally. For example,
either ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static method to run
a uimafit-style pipeline. Maybe we can slowly deprecate our xml descriptors too unless people
feel strongly about keeping those around.

Another benefit is that the cTAKES API is then trivial -- if you import ctakes into your pom
file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to get some feedback on
whether this is a good direction.

Tim




Mime
View raw message