manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: ManifoldCF slow documentum indexing performance
Date Thu, 06 Jul 2017 09:52:00 GMT
Hi Tamizh,

Set Xmx and Xms to same values for a better performance.

Kind Regards,
Furkan KAMACI

On Thu, Jul 6, 2017 at 9:10 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Tamizh,
>
> The Documentum Server Process is a thin shell around DFC and its
> dependencies.  In order to get helpful suggestions, you will need to
> contact Documentum, I'm afraid.
>
> Thanks,
> Karl
>
>
>
> On Thu, Jul 6, 2017 at 1:57 AM, Tamizh Kumaran Thamizharasan <
> tthamizharasan@worldbankgroup.org> wrote:
>
>> Thanks Karl!!
>>
>>
>>
>> After monitoring the CPU usage of Postgresql, the agents process, and the
>> documentum server process, mainly the documentum server process consumes
>> most of the CPU and the agent process is the second most CPU consumer.
>>
>>
>>
>> In documentum server run script, java heap is having value as below.
>>
>> *-Xmx512m -Xms32m*
>>
>>
>>
>> Is there any way to speed up the indexing through heap configuration or
>> increasing hardware?
>>
>> If so, Kindly share us the details.
>>
>>
>>
>> Regards,
>>
>> Tamizh Kumaran
>>
>>
>>
>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>> *Sent:* Wednesday, July 05, 2017 6:19 PM
>> *To:* user@manifoldcf.apache.org
>> *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
>> *Subject:* Re: ManifoldCF slow documentum indexing performance
>>
>>
>>
>> Hi Tamizh,
>>
>>
>>
>> The likely culprit is Documentum itself.  In my experience it can be
>> quite slow, depending on how it is configured.  But you can confirm that by
>> monitoring the CPU usage of Postgresql, the agents process, and the
>> documentum server process.  If none of these are CPU bound, then Documentum
>> itself is the problem.
>>
>>
>>
>> Thanks,
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 5, 2017 at 8:24 AM, Tamizh Kumaran Thamizharasan <
>> tthamizharasan@worldbankgroup.org> wrote:
>>
>> Hi Team,
>>
>>
>>
>> The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the
>> same linux box. The documentum server sits on a different linux box. The
>> indexing performance is slow(approx 1000 doc per hour) with the documentum
>> crawler. The used properties files is as below for reference
>>
>>
>>
>> <configuration>
>>
>>   <!-- Version string for UI -->
>>
>>   <!-- Point to a specific (common) logging file -->
>>
>>   <property name="org.apache.manifoldcf.logconfigfile"
>> value="./logging.ini"/>
>>
>>   <!-- Specify the connectors to be loaded -->
>>
>>   <property name="org.apache.manifoldcf.connectorsconfigurationfile"
>> value="../connectors.xml"/>
>>
>>   <!-- Specify the path to the file resources directory -->
>>
>>   <property name="org.apache.manifoldcf.fileresources"
>> value="../file-resources"/>
>>
>>   <property name="org.apache.manifoldcf.databaseimplementationclass"
>> value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
>>
>>   <property name="org.apache.manifoldcf.postgresql.hostname"
>> value="localhost"/>
>>
>>   <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
>>
>>   <property name="org.apache.manifoldcf.dbsuperusername"
>> value="postgres"/>
>>
>>   <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
>>
>>   <property name="org.apache.manifoldcf.database.name"
>> value="manifoldcf"/>
>>
>>   <property name="org.apache.manifoldcf.database.username"
>> value="postgres"/>
>>
>>   <property name="org.apache.manifoldcf.database.password" value=""/>
>>
>>   <property name="org.apache.manifoldcf.database.maxhandles"
>> value="100"/>
>>
>>   <property name="org.apache.manifoldcf.crawler.threads" value="15"/>
>>
>>   <property name="org.apache.manifoldcf.crawler.repository.store_history"
>> value="false"/>
>>
>>
>>
>>   <property name="org.apache.manifoldcf.zookeeper.connectstring"
>> value="***********:8349"/>
>>
>>   <property name="org.apache.manifoldcf.zookeeper.sessiontimeout"
>> value="5000"/>
>>
>> <!-- Tell MCF where to find the connector jars -->
>>
>>   <libdir path="../connector-lib"/>
>>
>>   <libdir path="../connector-common-lib"/>
>>
>>   <libdir path="../connector-lib-proprietary"/>
>>
>>   <!-- Any additional local properties go here -->
>>
>> </configuration>
>>
>>
>>
>> Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and
>> the observation is it taking a long time gap between each batch of 45
>> documents during processing.
>>
>> Can you please point out any changes/recommendations that will speed up
>> the indexing.
>>
>>
>>
>> Regards,
>>
>> Tamizh Kumaran Thamizharasan
>>
>>
>>
>>
>>
>
>

Mime
View raw message