ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: ytex DBconsumer and groovy parser
Date Wed, 02 Jul 2014 05:11:26 GMT
Hi John,

there is actually no grand difference between analysis engines and consumers.

Per default, a UIMA runtime may create multiple instances of an analysis engine and run them
in parallel (if the runtime supports that),
but a "consumer" must see all data going through the pipeline, so there can only be once instance.

The default value of flag about being allowing multiple instances or not is the only real
difference.

Basically any analysis engine that does only read annotations from the CAS but not add/change
anything is a consumer. Consequently, a consumer can be added anywhere in the pipeline, not
only at the end (I sometimes do that to see intermediate results).

If a component has the "allow multiple instances" flag set to "false" (which is usually what
you want), then runtimes may react to that differently. E.g. the Collection Processing Engine
(CPE) will single-thread all components (analysis engines or consumers) after it hits the
first component with "allow multiple instances" set to false (which is typically a consumer).
So to make optimal use of the CPEs multi-threading capabilities, such components should be
towards the end of the CPE pipeline.

I believe there is a Java interface declaration and base classes for "CasConsumers" in UIMA
- I haven't used these in years. The uimaFIT API doesn't even support these because everything
can also be (and is within uimaFIT) nicely modeled using analysis engines and the "allow multiple
instances" flag.

Cheers,

-- Richard

On 02.07.2014, at 04:01, Masanz, James J. <Masanz.James@mayo.edu> wrote:

> Hi John,
> 
> Not positive this is the line you are referring to, but there is a line in cTAKES_clinical_pipeline.groovy
(which is not in sandbox, btw) that has a comment about 
> 
> "createAnalysisEngineDescription  expects name to not end in .xml even though filename
actually does"
> 
> I am guessing the comment you see is trying to say the same thing. 
> 
> cTAKES_clinical_pipeline.groovy is in  ctakes-core/scripts/groovy
> 
> In that script, line 321 is where the writer is specified. There is no separately defined
"consumer" in the same sense that the CPE GUI has consumers that are separate from annotators.
The script just uses the last "annotator"  as a consumer and convention is AFAIK to call them
writers in this case.
> 
> Hope that helps,
> -- James
> 
> -----Original Message-----
> From: John Green [mailto:john.travis.green@gmail.com] 
> Sent: Tuesday, July 01, 2014 7:15 PM
> To: dev@ctakes.apache.org
> Subject: ytex DBconsumer and groovy parser
> 
> If someone has a free minute, which, judging from my own life is probably
> not the case - where in the groovy scrips in sandbox do you define the
> consumer to use? There is one comment that says "dont put the .xml here"
> then there is a path to the dictionary ae. Im working by ssh from the
> hospital a lot in my "free time" in the ICU and running gui CPEs isn't
> gonna cut it.
> 
> Apropos the ytex dbconsumer - I should be able to just tack this on to the
> end of the ytex aggregate pipeline?
> 
> I'm probably still asking very naive questions but to date I still haven't
> had the time to dive into UIMA's base very well, so I apologize.
> 
> My goal is to run the full ytex pipeline from the command line with the
> ytex dbconsumer ...
> 
> Thanks for everyone's patience,
> John


Mime
View raw message