oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Utilizing all available cores
Date Tue, 25 Nov 2014 19:59:18 GMT
Dear Poojit,

Thanks for the email and detailed description, some
thoughts below:



-----Original Message-----
From: Poojit Sharma <poojitsh@usc.edu>
Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Date: Monday, November 24, 2014 at 10:20 PM
To: "dev@oodt.apache.org" <dev@oodt.apache.org>
Subject: Utilizing all available cores

>Hi everyone,
>
>I am using OODT Radix v0.7 and need some help in fine tuning my system.
>Let
>me give you all an overview of my setup.
>
>I am crawling files in a directory using 'crawler' and ingesting it into
>the 'file manager'. I have a PGE task setup which is triggered after
>successful ingestion into the 'file manager'. The PGE Task then posts the
>file to Solr.
>
>Everything works great but I would like to get the most of the available
>resources. Currently, I am running this on c3.x8large AWS EC2 instance
>which has 32 vCPUs. Since I have 2 million files, I have divided those
>files into 32 folders and I am running 32 instances of 'crawler_launcher'.
>When I monitor the system using 'htop' I don't see max CPU utilization. I
>also notice in PCS Status via OPSUI, that a number of files are queued.

Which files are queued? And what do you mean by Queued? PCS status shows
current ingests? 

One thing you may want to do is expand the # of File Managers to achieve
more throughput. For example, you can have 32 file managers running as well
(probably too many, maybe something like # crawlers / 3, or ~10?). Seed
these File Managers with the same config, but run them on different ports.

> I
>also tried to set org.apache.oodt.cas.workflow.engine.minPoolSize and
>maxPoolSize to 32, as well as Solr's maxIndexingThreads to 32, but I think
>there is some bottleneck.

This depends on where you are running ingest. If it’s in the crawler and
FM, then more FMs and crawlers will help. If you are ingesting from
pipeline
processing, more FMs will help (and crawler load is already distributed
since CAS-PGE tasks are distributed).

>
>Is there an option to set number of threads of the 'file manager'?
>Any help will be appreciated.

See above - more file managers will help.

Cheers,
Chris

>
>Thanks,
>
>Poojit Sharma.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Mime
View raw message