oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Poojit Sharma <pooji...@usc.edu>
Subject Utilizing all available cores
Date Mon, 24 Nov 2014 21:20:03 GMT
Hi everyone,

I am using OODT Radix v0.7 and need some help in fine tuning my system. Let
me give you all an overview of my setup.

I am crawling files in a directory using 'crawler' and ingesting it into
the 'file manager'. I have a PGE task setup which is triggered after
successful ingestion into the 'file manager'. The PGE Task then posts the
file to Solr.

Everything works great but I would like to get the most of the available
resources. Currently, I am running this on c3.x8large AWS EC2 instance
which has 32 vCPUs. Since I have 2 million files, I have divided those
files into 32 folders and I am running 32 instances of 'crawler_launcher'.
When I monitor the system using 'htop' I don't see max CPU utilization. I
also notice in PCS Status via OPSUI, that a number of files are queued. I
also tried to set org.apache.oodt.cas.workflow.engine.minPoolSize and
maxPoolSize to 32, as well as Solr's maxIndexingThreads to 32, but I think
there is some bottleneck.

Is there an option to set number of threads of the 'file manager'?
Any help will be appreciated.


Poojit Sharma.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message