Hi everyone,
I am using OODT Radix v0.7 and need some help in fine tuning my system. Let
me give you all an overview of my setup.
I am crawling files in a directory using 'crawler' and ingesting it into
the 'file manager'. I have a PGE task setup which is triggered after
successful ingestion into the 'file manager'. The PGE Task then posts the
file to Solr.
Everything works great but I would like to get the most of the available
resources. Currently, I am running this on c3.x8large AWS EC2 instance
which has 32 vCPUs. Since I have 2 million files, I have divided those
files into 32 folders and I am running 32 instances of 'crawler_launcher'.
When I monitor the system using 'htop' I don't see max CPU utilization. I
also notice in PCS Status via OPSUI, that a number of files are queued. I
also tried to set org.apache.oodt.cas.workflow.engine.minPoolSize and
maxPoolSize to 32, as well as Solr's maxIndexingThreads to 32, but I think
there is some bottleneck.
Is there an option to set number of threads of the 'file manager'?
Any help will be appreciated.
Thanks,
Poojit Sharma.
|