manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronny Heylen <securaqbere...@gmail.com>
Subject Error in Manifoldcf, what's the first step?
Date Tue, 29 Oct 2013 10:51:55 GMT
Hi,

Solr is 4.4, manifoldcf 1.3.

We are indexing a shared windows network drive, filtering on *.doc*,
*.xls*, *.pdf ... with about 650,000 files to index, giving a SOLR index
35GB in size.

The result is great except that the manifoldcf job crashes before the end.

Note that:
- ignoreTikaException is true in solrconfig.xml (otherwise the manifoldcf
job stops very early).
- tomcat has been given 24 GB of memory (it uses 15GB)
- there are 8 cores

Message in http://localhost:8080/mcf-crawler-ui/showjobstatus.jsp is:
Error: Repeated service interruptions - failure processing document: Server
at http://localhost:8080/solr/collection1 returned non ok status:500,
message:Internal Server Error

Then, instead of indexing the full drive in one job, we have defined one
job for each subfolder.

Almost all "subfolder" jobs end successfully, only for 2 or 3 we receive
the same message, and for 2 or 3 other ones a different message:

Error: Repeated service interruptions - failure processing document: Read
timed out

If we try to go further (defining one job for each subfolder of a subfolder
in error), the same happens: success for almost all subfolders except 1 or
2.

What is the first step to do to solve this problem?

Thanks.

Mime
View raw message