Hi,

Solr is 4.4, manifoldcf 1.3.

We are indexing a shared windows network drive, filtering on *.doc*, *.xls*, *.pdf ... with about 650,000 files to index, giving a SOLR index 35GB in size.

The result is great except that the manifoldcf job crashes before the end.

Note that:
- ignoreTikaException is true in solrconfig.xml (otherwise the manifoldcf job stops very early).
- tomcat has been given 24 GB of memory (it uses 15GB)
- there are 8 cores

Message in http://localhost:8080/mcf-crawler-ui/showjobstatus.jsp is:
Error: Repeated service interruptions - failure processing document: Server at http://localhost:8080/solr/collection1 returned non ok status:500, message:Internal Server Error

Then, instead of indexing the full drive in one job, we have defined one job for each subfolder.

Almost all "subfolder" jobs end successfully, only for 2 or 3 we receive the same message, and for 2 or 3 other ones a different message:

Error: Repeated service interruptions - failure processing document: Read timed out

If we try to go further (defining one job for each subfolder of a subfolder in error), the same happens: success for almost all subfolders except 1 or 2.

What is the first step to do to solve this problem?

Thanks.