manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org>
Subject ManifoldCF documentum indexing issue
Date Tue, 13 Jun 2017 07:20:03 GMT
Hi,

The Manifoldcf 2.7.1 is running in the multiprocess zk model and integrated with PostgreSQL
9.3. The expected setup is to crawl the Documentum contents and pushed on to the output SOLR
5.3.2. The crawler-ui app is installed on the tomcat and startup script is pointed with the
MF properties.xml during server startup. Manifold along with the bundled ZK, tomcat are running
on the same host with OS as  Red Hat Enterprise Linux Server release 6.9 (Santiago). The DB
is running on a windows box.
The ZK is integrated with the DB through the properties.xml and properties-global.xml
The ZK, the documentum related processes(registry and server) are up and the  two agents (start-agents.sh
and start-agents-2.sh) are started  which produce multiple threads to index the documemtum
contents into SOLR through ManifoldCF.

The Current no of the connections configured on the MF are as below.
SOLR Output max connection : 25
Document repository  Max Connections: 25
Properties.xml:
<property name="org.apache.manifoldcf.database.maxhandles" value="50"/>
<property name="org.apache.manifoldcf.crawler.threads" value="25"/>
Total documentum document count : 0.5 million

After the Job is started, it indexed some 20000+ documents and gets terminated with the below
error on the Manifold JOB.
Error: Repeated service interruptions - failure processing document: Error from server at
http://localhost:8983/solr/documentum_manifoldcf_stg: String index out of range: -188

Please find the attached manifoldCF error log and agent log.

Please let me know the observations on the cause of the issue and the configuration on the
threads used  for crawling. Please share your thoughts.

Regards,
Tamizh Kumaran


Mime
View raw message