ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miller, Timothy" <Timothy.Mil...@childrens.harvard.edu>
Subject Re: Running cTAKES using all cores available
Date Sun, 11 Oct 2015 12:12:10 GMT
Hi Frank,

I've only been able to do so using UIMA-AS (Asynchronous Scaleout). I was thinking of writing
up a quickstart/tutorial on this but here is the brief list of steps. In a near-future release
there may be an easier way because up until now one of our external dependencies (LVG) was
not thread-safe. But it has been improved and local hyperthreading could potentially be made
easier. What I am about to describe could be extended to multiple machines.

1) Download UIMA-AS binaries for your desired version and unpack

2) Go into that directory and start the AMQ broker at bin/startBroker.{sh,bat} -- UIMA uses
this to set up all the necessary queues between readers/pipelines and between pipelines

3) Create a deployment descriptor file.  Yes, another UIMA XML descriptor type. But luckily
if you use eclipse this is really easy to do with UIMA and UIMA-AS tooling. Otherwise you
can look at the UIMA documentation[1]. For the simplest case of multiplying out a single pipeline
it basically just has to point to an analysis engine desciptor and have a number of CASes
which will correspond to the number of pipelines you want to run. You can also get more complicated
and just scale out single analysis engines within your pipeline but I will leave this to you
to learn more about.

4) Setup your path variables: UIMA_HOME should point at the UIMA-AS download directory, UIMA_CLASSPATH
needs to have all the jars/directories that your analysis engines need.

5) Startup your pipelines with bin/deployAsyncService.{sh,bat}

6) Debug by looking in uima.log to see error messages.

Now to get documents to your pipelines, see the API docs [2] for how to setup the engine information
about your pipelines, but then instead of creating a cas and calling sendCas(), you can create
a collection reader in UimaFIT and then call setCollectionReader on the uima as engine object.

You may or may not want to go this route -- it's more complicated then just saying "use 8
cores," but it is nice if you want to eventually setup a bunch of pipelines on a cluster or


[1] UIMA-AS deployment descriptor documentation: https://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html#ugr.ref.async.deploy

[2] https://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html#ugr.ref.async.api.usage

From: Franck Dernoncourt <franck.dernoncourt@gmail.com>
Sent: Saturday, October 10, 2015 7:37 PM
To: user@ctakes.apache.org
Subject: Running cTAKES using all cores available


When processing several text documents with cTAKES, is there any way to use all the cores
available on the machine? When I batch process documents using the CPE Configurator, cTAKES
only uses one core. I read it is possible to have cTAKES use all cores available programmatically
(e.g. http://ctakes.markmail.org/search/?q=list%3Aorg.apache.incubator.ctakes-user+multi#query:list%3Aorg.apache.incubator.ctakes-user%20multi+page:1+mid:7xancosdfbnmm67d+state:results<https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.markmail.org_search_-3Fq-3Dlist-253Aorg.apache.incubator.ctakes-2Duser-2Bmulti-23query-3Alist-253Aorg.apache.incubator.ctakes-2Duser-2520multi-2Bpage-3A1-2Bmid-3A7xancosdfbnmm67d-2Bstate-3Aresults&d=BQMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kBsduhiqp3ioqSomSEHwwxjajfcRSiKxxUVtTG9zQ9Y&s=qW9JBQVOIRSOcdbA4skv6d3cvN-Yh1I-Chm-LI5gW6o&e=>),
but I wonder whether it's possible to do so through the GUI or the config files.


Franck Dernoncourt

View raw message