uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dr. Raja M. Suleman <raja.m.sulai...@gmail.com>
Subject Re: UIMA DUCC slow processing
Date Thu, 11 Jun 2020 00:30:13 GMT
Hi,
Thank you for your reply and I'm sorry I couldn't get back to this earlier. 

To get a better picture of the processing speed of DUCC, I made a dummy pipeline where the
CollectionReader runs a for loop to generate 100k workitems (so no disk reads). each workitem
only has a simple string in it. These are then passed on to the CasMultiplier where for each
workitem I'm creating a new CAS with DocumentInfo (again only having a simple string value)
and pass it as a newcas to the CasConsumer. The CasConsumer doesn't do anything except add
the Document received in the CAS to the logger. So basically this pipeline isn't doing anything,
no Input reads and the only output is the information added to the logger. Running this on
the cluster with 2 slave nodes with 8-CPUs and 32GB RAM each is still taking more than 30
minutes. I don't understand how is this possible since there's no heavy I/O processing is
happening in the code. 

Any ideas please?

Thank you.

On 2020/05/18 12:47:41, Eddie Epstein <eaepstein@gmail.com> wrote: 
> Hi,
> 
> Removing the AE from the pipeline was a good idea to help isolate the
> bottleneck. The other two most likely possibilities are the collection
> reader pulling from elastic search or the CAS consumer writing the
> processing output.
> 
> DUCC Jobs are a simple way to scale out compute bottlenecks across a
> cluster. Scaleout may be of limited or no value for I/O bound jobs.
> Please give a more complete picture of the processing scenario on DUCC.
> 
> Regards,
> Eddie
> 
> 
> On Sat, May 16, 2020 at 1:29 AM Raja Muhammad Suleman <
> Sulemanr@edgehill.ac.uk> wrote:
> 
> > Hi,
> > I've been trying to run a very small UIMA DUCC cluster with 2 slave nodes
> > having 32GB of RAM each. I wrote a custom Collection Reader to read data
> > from an Elasticsearch index and dump it into a new index after certain
> > analysis engine processing. The Analysis Engine is a simple sentiment
> > analysis code. The performance I'm getting is very slow as it is only able
> > to process ~150 documents/minute.
> > To test the performance without the analysis engine, I removed the AE from
> > the pipeline but still I did not get any improvement in the processing
> > speeds. Can you please guide me as to where I might be going wrong or what
> > I can do to improve the processing speeds?
> >
> > Thank you.
> > ________________________________
> > Edge Hill University<http://ehu.ac.uk/home/emailfooter>
> > Teaching Excellence Framework Gold Award<http://ehu.ac.uk/tef/emailfooter>
> > ________________________________
> > This message is private and confidential. If you have received this
> > message in error, please notify the sender and remove it from your system.
> > Any views or opinions presented are solely those of the author and do not
> > necessarily represent those of Edge Hill or associated companies. Edge Hill
> > University may monitor email traffic data and also the content of email for
> > the purposes of security and business communications during staff absence.<
> > http://ehu.ac.uk/itspolicies/emailfooter>
> >
> 

Mime
View raw message