manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF slow documentum indexing performance
Date Wed, 05 Jul 2017 12:48:49 GMT
Hi Tamizh,

The likely culprit is Documentum itself.  In my experience it can be quite
slow, depending on how it is configured.  But you can confirm that by
monitoring the CPU usage of Postgresql, the agents process, and the
documentum server process.  If none of these are CPU bound, then Documentum
itself is the problem.

Thanks,
Karl


On Wed, Jul 5, 2017 at 8:24 AM, Tamizh Kumaran Thamizharasan <
tthamizharasan@worldbankgroup.org> wrote:

> Hi Team,
>
>
>
> The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the
> same linux box. The documentum server sits on a different linux box. The
> indexing performance is slow(approx 1000 doc per hour) with the documentum
> crawler. The used properties files is as below for reference
>
>
>
> <configuration>
>
>   <!-- Version string for UI -->
>
>   <!-- Point to a specific (common) logging file -->
>
>   <property name="org.apache.manifoldcf.logconfigfile"
> value="./logging.ini"/>
>
>   <!-- Specify the connectors to be loaded -->
>
>   <property name="org.apache.manifoldcf.connectorsconfigurationfile"
> value="../connectors.xml"/>
>
>   <!-- Specify the path to the file resources directory -->
>
>   <property name="org.apache.manifoldcf.fileresources"
> value="../file-resources"/>
>
>   <property name="org.apache.manifoldcf.databaseimplementationclass"
> value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
>
>   <property name="org.apache.manifoldcf.postgresql.hostname"
> value="localhost"/>
>
>   <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
>
>   <property name="org.apache.manifoldcf.dbsuperusername"
> value="postgres"/>
>
>   <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
>
>   <property name="org.apache.manifoldcf.database.name"
> value="manifoldcf"/>
>
>   <property name="org.apache.manifoldcf.database.username"
> value="postgres"/>
>
>   <property name="org.apache.manifoldcf.database.password" value=""/>
>
>   <property name="org.apache.manifoldcf.database.maxhandles" value="100"/>
>
>   <property name="org.apache.manifoldcf.crawler.threads" value="15"/>
>
>   <property name="org.apache.manifoldcf.crawler.repository.store_history"
> value="false"/>
>
>
>
>   <property name="org.apache.manifoldcf.zookeeper.connectstring"
> value="***********:8349"/>
>
>   <property name="org.apache.manifoldcf.zookeeper.sessiontimeout"
> value="5000"/>
>
> <!-- Tell MCF where to find the connector jars -->
>
>   <libdir path="../connector-lib"/>
>
>   <libdir path="../connector-common-lib"/>
>
>   <libdir path="../connector-lib-proprietary"/>
>
>   <!-- Any additional local properties go here -->
>
> </configuration>
>
>
>
> Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and
> the observation is it taking a long time gap between each batch of 45
> documents during processing.
>
> Can you please point out any changes/recommendations that will speed up
> the indexing.
>
>
>
> Regards,
>
> Tamizh Kumaran Thamizharasan
>
>
>

Mime
View raw message