uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lou DeGenaro <lou.degen...@gmail.com>
Subject Re: UIMA DUCC slow processing
Date Fri, 12 Jun 2020 17:56:07 GMT
Have you examined various DUCC and system log files for possible clues?

On Fri, Jun 12, 2020 at 1:16 PM Dr. Raja M. Suleman <
raja.m.sulaiman@gmail.com> wrote:

> Hi,
> Thank you for your reply and I'm sorry I couldn't get back to this
> earlier.
>
> To get a better picture of the processing speed of DUCC, I made a dummy
> pipeline where the CollectionReader runs a for loop to generate 100k
> workitems (so no disk reads). each workitem only has a simple string in it.
> These are then passed on to the CasMultiplier where for each workitem I'm
> creating a new CAS with DocumentInfo (again only having a simple string
> value) and pass it as a newcas to the CasConsumer. The CasConsumer doesn't
> do anything except add the Document received in the CAS to the logger. So
> basically this pipeline isn't doing anything, no Input reads and the only
> output is the information added to the logger. Running this on the cluster
> with 2 slave nodes with 8-CPUs and 32GB RAM each is still taking more than
> 30 minutes. I don't understand how is this possible since there's no heavy
> I/O processing is happening in the code.
>
> Any ideas please?
>
> Thank you.
>
> On 2020/05/18 12:47:41, Eddie Epstein <eaepstein@gmail.com> wrote:
> > Hi,
> >
> > Removing the AE from the pipeline was a good idea to help isolate the
> > bottleneck. The other two most likely possibilities are the collection
> > reader pulling from elastic search or the CAS consumer writing the
> > processing output.
> >
> > DUCC Jobs are a simple way to scale out compute bottlenecks across a
> > cluster. Scaleout may be of limited or no value for I/O bound jobs.
> > Please give a more complete picture of the processing scenario on DUCC.
> >
> > Regards,
> > Eddie
> >
> >
> > On Sat, May 16, 2020 at 1:29 AM Raja Muhammad Suleman <
> > Sulemanr@edgehill.ac.uk> wrote:
> >
> > > Hi,
> > > I've been trying to run a very small UIMA DUCC cluster with 2 slave
> nodes
> > > having 32GB of RAM each. I wrote a custom Collection Reader to read
> data
> > > from an Elasticsearch index and dump it into a new index after certain
> > > analysis engine processing. The Analysis Engine is a simple sentiment
> > > analysis code. The performance I'm getting is very slow as it is only
> able
> > > to process ~150 documents/minute.
> > > To test the performance without the analysis engine, I removed the AE
> from
> > > the pipeline but still I did not get any improvement in the processing
> > > speeds. Can you please guide me as to where I might be going wrong or
> what
> > > I can do to improve the processing speeds?
> > >
> > > Thank you.
> > > ________________________________
> > > Edge Hill University<http://ehu.ac.uk/home/emailfooter>
> > > Teaching Excellence Framework Gold Award<
> http://ehu.ac.uk/tef/emailfooter>
> > > ________________________________
> > > This message is private and confidential. If you have received this
> > > message in error, please notify the sender and remove it from your
> system.
> > > Any views or opinions presented are solely those of the author and do
> not
> > > necessarily represent those of Edge Hill or associated companies. Edge
> Hill
> > > University may monitor email traffic data and also the content of
> email for
> > > the purposes of security and business communications during staff
> absence.<
> > > http://ehu.ac.uk/itspolicies/emailfooter>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message