ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject RE: suggestion for default pipelines
Date Sat, 26 Apr 2014 22:18:11 GMT
Please take a look at what I've done so far if you're interested:

I don't have the full pipeline supported yet, but I have all the components up to the dictionary
with their own static primitive creating methods, and then some methods for getting common
aggregates. If anyone has any concerns at this point please let me know otherwise I'll keep
going along this track.


PS Sean Finan and I had an offline discussion about interesting next steps -- first, self-building
pipelines, where you can get aggregates from primitives by having them build pipelines with
their own prerequisites. Specifically, a method in the dictionary annotator that builds a
pipeline with the lookup window annotator and adds itself at the end, where the lookup window
annotator builds itself in a similar way, and so on recursively. Then we thought it would
be cool as well to just have the APi programmer just specify what types they want (EventMention,
EntityMention), and have the pipeline built to get those types. That requires a bit more infrastructure,
but would be really cool!

From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Thursday, April 24, 2014 5:48 PM
To: dev@ctakes.apache.org
Subject: Re: suggestion for default pipelines

Any preference for separate factory classes:

class SentenceDetectorAnnotatorFactory:

static AnalysisEngineDescription getSentenceDetectorAnnotator()


static methods added to primitive annotators:

class SentenceDetector (existing)

static AnalysisEngineDescription getSentenceDetectorAnnotator()


The former can clutter up the class space while the latter extends the
length of classes, especially if there are multiple versions
(getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
getMeshDictionaryAnnotator(), etc.)


On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
> It would be nice if uimaFIT provided a Maven plugin to automatically
> generate descriptors for aggregates. Maybe if we come up with a
> convention for factories, e.g. a "class with static methods that do
> not take any parameters and that return descriptors", or "methods
> that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)"
> it should be possible to implement such a Maven plugin.
> Cheers,
> -- Richard
> On 16.04.2014, at 05:21, Steven Bethard <steven.bethard@gmail.com> wrote:
>> +1. And note that once you have a descriptor, you can generate the
>> XML, so we should arrange to replace the current XML descriptors with
>> ones generated automatically from the uimaFIT code. That should reduce
>> some synchronization problems when the Java code was changed but the
>> XML descriptor was not.
>> Steve
>> On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
>> <Timothy.Miller@childrens.harvard.edu> wrote:
>>> The discussion in the other thread with Abraham Tom gave me an idea I
>>> wanted to float to the list. We have been using some UIMAFit pipeline
>>> builders in the temporal project that maybe could be moved into
>>> clinical-pipeline. For example, look to this file:
>>> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>>> with the static methods getPreprocessorAggregateBuilder() and
>>> getLightweightPreprocessorAggregateBuilder()   [no umls].
>>> So my idea would be to create a class in clinical-pipeline
>>> (CTakesPipelines) with static methods for some standard pipelines (to
>>> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>>> getStandardUMLSPipeline()  -- builds pipeline currently in
>>> AggregatePlaintextUMLSProcessor.xml
>>> getFullPipeline() -- same as above but with SRL, constituency parsing,
>>> etc., every component in ctakes
>>> We could then potentially merge our entry points -- I think Abraham's
>>> experience points out that this is currently confusing, as well as
>>> probably not implemented optimally. For example, either
>>> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
>>> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
>>> our xml descriptors too unless people feel strongly about keeping those
>>> around.
>>> Another benefit is that the cTAKES API is then trivial -- if you import
>>> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>>> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>>> I think this would actually be pretty easy to implement, but hoping to
>>> get some feedback on whether this is a good direction.
>>> Tim

Tim Miller
Boston Children's Hospital and Harvard Medical School

View raw message