manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF documentum indexing issue
Date Tue, 13 Jun 2017 08:53:18 GMT
Hi Tamizh,

The reported error is 'Error from server at http://localhost:8983/solr/
documentum_manifoldcf_stg: String index out of range: -188'.  The message
seemingly indicates that the error was *received* from the solr server for
one specific document.  ManifoldCF does not recognize the error as being
innocuous and therefore it will retry for a while until it eventually gives
up and halts the job.  However, I cannot find that exact text anywhere in
the Solr output connector code, so I wonder if you transcribed it correctly?

There should also be the following:
(1) A record of the attempts in the manifoldcf.log file, with a MCF stack
trace attached to each one;
(2) Simple history records for that document that are of the type
INGESTDOCUMENT.
(3) Solr log entries that have a Solr stack trace.

The last one is the one that would be the most helpful.  It is possible
that you are seeing a problem in Solr Cell (Tika) that is manifesting
itself in this way.  You can (and should) configure your Solr to ignore
Tika errors.

Thanks,
Karl




On Tue, Jun 13, 2017 at 3:20 AM, Tamizh Kumaran Thamizharasan <
tthamizharasan@worldbankgroup.org> wrote:

> Hi,
>
>
>
> The Manifoldcf 2.7.1 is running in the multiprocess zk model and
> integrated with PostgreSQL 9.3. The expected setup is to crawl the
> Documentum contents and pushed on to the output SOLR 5.3.2. The crawler-ui
> app is installed on the tomcat and startup script is pointed with the MF
> properties.xml during server startup. Manifold along with the bundled ZK,
> tomcat are running on the same host with OS as  Red Hat Enterprise Linux
> Server release 6.9 (Santiago). The DB is running on a windows box.
>
> The ZK is integrated with the DB through the properties.xml and
> properties-global.xml
>
> The ZK, the documentum related processes(registry and server) are up and
> the  two agents (start-agents.sh and start-agents-2.sh) are started  which
> produce multiple threads to index the documemtum contents into SOLR through
> ManifoldCF.
>
>
>
> The Current no of the connections configured on the MF are as below.
>
> SOLR Output max connection : 25
>
> Document repository  Max Connections: 25
>
> Properties.xml:
>
> <property name="org.apache.manifoldcf.database.maxhandles" value="50"/>
>
> <property name="org.apache.manifoldcf.crawler.threads" value="25"/>
>
> Total documentum document count : 0.5 million
>
>
>
> After the Job is started, it indexed some 20000+ documents and gets
> terminated with the below error on the Manifold JOB.
>
> Error: Repeated service interruptions - failure processing document: Error
> from server at http://localhost:8983/solr/documentum_manifoldcf_stg:
> String index out of range: -188
>
>
>
> Please find the attached manifoldCF error log and agent log.
>
>
>
> Please let me know the observations on the cause of the issue and the
> configuration on the threads used  for crawling. Please share your thoughts.
>
>
>
> Regards,
>
> Tamizh Kumaran
>
>
>

Mime
View raw message