manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Error: Repeated service interruptions - failure processing document: Read timed out
Date Wed, 06 Nov 2013 20:28:41 GMT
Hi Ronny,

One minor thing: you should need to set throttling to 2 ONLY for the
Windows repository connection, not for AD or Solr.


As for how to debug this issue, first off you should be looking in the
manifoldcf.log file (or the equivalent).  You should see WARN messages from
the shared file connector under most conditions when there's a service
interruption.  You would probably see "Read timed out" warnings if you
looked there, since that is what aborted the job run, along with a stack
trace.  However, that's not going to add much information to the analysis
at this point.

What might be valuable is to determine whether the problem is happening on
the Windows side or on the Solr side.  At this point I can't tell.  You
could, however, create a null output connection, and create  a similar job
the sends its output there, and see if it completes.  Can you do this and
get back to me?

Thanks,
Karl





On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen <securaqbereusr@gmail.com>wrote:

> Hi,
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
> several hundred thousands documents.
> Doing only one manifoldcf job to index all the drive was always giving
> some kind of error, therefore to better understand where the problem can
> be, we made one job to index all *.doc*, another one for *.xls*, another
> one for *.pdf ...
> Using the help from the list (thanks!) we set the size limit to 100MB and
> all jobs succeeds (great) except the one for *.pptx
> The message is
> Error: Repeated service interruptions - failure processing document: Read
> timed out
> We don't find any error in the log we have searched: solr.log, ...
> Based on some indications found on Internet, we have set the Throttling
> max connections setting to 2 (instead of 10) in 3 places:
> output connection to SOLR
> authority connection to the Active Directory
> repository connection to the windows file share
> But the problem stays the same.
> We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
> problem.
> We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS* without
> problem, but the same message comes always for *.PPTX.
> The last time the job stops with the message, it displays (not the same
> numbers for each run as the windows drive is changing) 56311 documents,
> with 17466 busy and 38847 processed.
> As we don't find anything in the log (but probably we don't look at the
> correct place), we don't know what to do.
> Thanks for your help,
> Ronny and Frédéric
>

Mime
View raw message