Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59F56100A6 for ; Thu, 7 Nov 2013 18:36:58 +0000 (UTC) Received: (qmail 36639 invoked by uid 500); 7 Nov 2013 18:36:58 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 36472 invoked by uid 500); 7 Nov 2013 18:36:57 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 36463 invoked by uid 99); 7 Nov 2013 18:36:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 18:36:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of securaqbereusr@gmail.com designates 209.85.215.49 as permitted sender) Received: from [209.85.215.49] (HELO mail-la0-f49.google.com) (209.85.215.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 18:36:51 +0000 Received: by mail-la0-f49.google.com with SMTP id ev20so781707lab.36 for ; Thu, 07 Nov 2013 10:36:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+o3kMgDNoLfINNgxeN2JNCNJ98STGEWAxwxPkySbcQI=; b=B/thWF+NKaa3M9B4m1vixtGiuY3ZQnuolCA7UGuCoK4PZtVLrVBlVgfIb/9KRHyfKz jyjdmFj2A0HXzmBDWvWBj9itWwRDaz5sMibaLHaPefdmEppR8niM707rofXR5kOI/zPf 1T3OhxxDx+n2GqA30rkOiUDeWyLrx3z2mZ3b8CRvrG4O/AOEDJkl48z6vQHM7EGYU0bs 64LbENSBXa0SBbdRimcu8p2uGV16ORIkIBdsuaOk0ui4+CFtwpPSGOnwULqfdRSccOf8 CWYRvHMGbjker4O+vih1pi+JmuTJTqWniVxEIQgeuNviD/1kXAmaRHdN2qR38kwJ3KQj y/iw== MIME-Version: 1.0 X-Received: by 10.112.51.101 with SMTP id j5mr7283824lbo.17.1383849389707; Thu, 07 Nov 2013 10:36:29 -0800 (PST) Received: by 10.114.93.40 with HTTP; Thu, 7 Nov 2013 10:36:29 -0800 (PST) In-Reply-To: References: Date: Thu, 7 Nov 2013 19:36:29 +0100 Message-ID: Subject: Re: Error: Repeated service interruptions - failure processing document: Read timed out From: Ronny Heylen To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary=001a113364604bd5e604ea9a8b96 X-Virus-Checked: Checked by ClamAV on apache.org --001a113364604bd5e604ea9a8b96 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Karl, I don't know where you live but if you come to Belgium, stop in Brussels for a good Belgian beer ;-) In other words, setting the socket timeout to 2000 instead of 900 has solved the problem. It has indexed about 160,000 documents in 2 hours. On the other hand, the Manifold/Solr machine (all run in the same Windows VM) has been allocated 8 3.6GHZ CPU and 32GB memory, and is used only for the indexing test, no search on SOLR. So the fact that a timeout of 900 seconds was not enough looks strange: is it possible that some of these 160,000 docments take more than 15 minutes to be handled by SOLR? Ronny&Fr=E9d=E9ric On Thu, Nov 7, 2013 at 4:30 PM, Karl Wright wrote: > Hi Ronny, > > The failure is being caused because the time spent transferring data to > Solr is exceeding the socket timeout you have set for the Solr connection= , > for some documents. > > This is probably due to excessive load on the Solr instance. My > suggestion is to increase the socket timeout on your solr connection to a= t > least 30 minutes or more to see if this resolves. > > Thanks, > Karl > > > > On Thu, Nov 7, 2013 at 9:30 AM, Ronny Heylen wr= ote: > >> Hi, >> We have reset thottling to 10 for AD and SOLR (2 for the windows >> repository). >> Job indexing all pptx to null ouput has run successfully (162733 >> documents) >> Job indexing all pptx to solr still fails, manifoldcf.log contains: >> WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possibly >> transient exception detected on attempt 1 while getting share security: = All >> pipe instances are busy. >> jcifs.smb.SmbException: All pipe instances are busy. >> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) >> at jcifs.smb.SmbTransport.send(SmbTransport.java:663) >> at jcifs.smb.SmbSession.send(SmbSession.java:238) >> at jcifs.smb.SmbTree.send(SmbTree.java:119) >> at jcifs.smb.SmbFile.send(SmbFile.java:775) >> at jcifs.smb.SmbFile.open0(SmbFile.java:989) >> at jcifs.smb.SmbFile.open(SmbFile.java:1006) >> at jcifs.smb.SmbFileOutputStream.(SmbFileOutputStream.java:142= ) >> at >> jcifs.smb.TransactNamedPipeOutputStream.(TransactNamedPipeOutputSt= ream.java:32) >> at >> jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187) >> at >> jcifs.dcerpc.DcerpcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68) >> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190) >> at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126) >> at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140) >> at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943) >> at >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector= .getFileShareSecurity(SharedDriveConnector.java:2393) >> at >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector= .describeDocumentSecurity(SharedDriveConnector.java:1045) >> at >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector= .getDocumentVersions(SharedDriveConnector.java:554) >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:= 322) >> WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception during >> indexing: Read timed out >> java.net.SocketTimeoutException: Read timed out >> at java.net.SocketInputStream.socketRead0(Native Method) >> at java.net.SocketInputStream.read(SocketInputStream.java:152) >> at java.net.SocketInputStream.read(SocketInputStream.java:122) >> at >> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSe= ssionInputBuffer.java:166) >> at >> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.j= ava:90) >> at >> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSess= ionInputBuffer.java:281) >> at >> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHtt= pResponseParser.java:92) >> at >> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHtt= pResponseParser.java:62) >> at >> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParse= r.java:254) >> at >> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(= AbstractHttpClientConnection.java:289) >> at >> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(= DefaultClientConnection.java:252) >> at >> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHea= der(ManagedClientConnectionImpl.java:191) >> at >> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpReque= stExecutor.java:300) >> at >> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor= .java:127) >> at >> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReq= uestDirector.java:715) >> at >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultReques= tDirector.java:520) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:906) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:805) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:784) >> at >> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(= ModifiedHttpSolrServer.java:291) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.= java:180) >> at >> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr= actUpdateRequest.java:117) >> at >> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(Htt= pPoster.java:919) >> WARN 2013-11-07 14:55:45,273 (Worker thread '30') - Service interruptio= n >> reported for job 1383765534700 connection 'Filesharesrv1': IO exception >> during indexing: Read timed out >> ERROR 2013-11-07 14:55:45,304 (Worker thread '30') - Exception tossed: >> Repeated service interruptions - failure processing document: Read timed= out >> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated >> service interruptions - failure processing document: Read timed out >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:= 586) >> Caused by: java.net.SocketTimeoutException: Read timed out >> at java.net.SocketInputStream.socketRead0(Native Method) >> at java.net.SocketInputStream.read(SocketInputStream.java:152) >> at java.net.SocketInputStream.read(SocketInputStream.java:122) >> at >> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSe= ssionInputBuffer.java:166) >> at >> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.j= ava:90) >> at >> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSess= ionInputBuffer.java:281) >> at >> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHtt= pResponseParser.java:92) >> at >> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHtt= pResponseParser.java:62) >> at >> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParse= r.java:254) >> at >> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(= AbstractHttpClientConnection.java:289) >> at >> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(= DefaultClientConnection.java:252) >> at >> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHea= der(ManagedClientConnectionImpl.java:191) >> at >> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpReque= stExecutor.java:300) >> at >> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor= .java:127) >> at >> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReq= uestDirector.java:715) >> at >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultReques= tDirector.java:520) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:906) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:805) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:784) >> at >> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(= ModifiedHttpSolrServer.java:291) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.= java:180) >> at >> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr= actUpdateRequest.java:117) >> at >> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(Htt= pPoster.java:919) >> WARN 2013-11-07 15:06:04,235 (Worker thread '9') - IO exception during >> indexing: Read timed out >> java.net.SocketTimeoutException: Read timed out >> at java.net.SocketInputStream.socketRead0(Native Method) >> at java.net.SocketInputStream.read(SocketInputStream.java:152) >> at java.net.SocketInputStream.read(SocketInputStream.java:122) >> at >> org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSe= ssionInputBuffer.java:166) >> at >> org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.j= ava:90) >> at >> org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSess= ionInputBuffer.java:281) >> at >> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHtt= pResponseParser.java:92) >> at >> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHtt= pResponseParser.java:62) >> at >> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParse= r.java:254) >> at >> org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(= AbstractHttpClientConnection.java:289) >> at >> org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(= DefaultClientConnection.java:252) >> at >> org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHea= der(ManagedClientConnectionImpl.java:191) >> at >> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpReque= stExecutor.java:300) >> at >> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor= .java:127) >> at >> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultReq= uestDirector.java:715) >> at >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultReques= tDirector.java:520) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:906) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:805) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClien= t.java:784) >> at >> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(= ModifiedHttpSolrServer.java:291) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.= java:180) >> at >> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr= actUpdateRequest.java:117) >> at >> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(Htt= pPoster.java:919) >> WARN 2013-11-07 15:06:04,235 (Worker thread '9') - Service interruption >> reported for job 1383765534700 connection 'Filesharesrv1': IO exception >> during indexing: Read timed out >> >> >> >> On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright wrote: >> >>> Hi Ronny, >>> >>> One minor thing: you should need to set throttling to 2 ONLY for the >>> Windows repository connection, not for AD or Solr. >>> >>> >>> As for how to debug this issue, first off you should be looking in the >>> manifoldcf.log file (or the equivalent). You should see WARN messages = from >>> the shared file connector under most conditions when there's a service >>> interruption. You would probably see "Read timed out" warnings if you >>> looked there, since that is what aborted the job run, along with a stac= k >>> trace. However, that's not going to add much information to the analys= is >>> at this point. >>> >>> What might be valuable is to determine whether the problem is happening >>> on the Windows side or on the Solr side. At this point I can't tell. = You >>> could, however, create a null output connection, and create a similar = job >>> the sends its output there, and see if it completes. Can you do this a= nd >>> get back to me? >>> >>> Thanks, >>> Karl >>> >>> >>> >>> >>> >>> On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen = wrote: >>> >>>> Hi, >>>> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive wit= h >>>> several hundred thousands documents. >>>> Doing only one manifoldcf job to index all the drive was always giving >>>> some kind of error, therefore to better understand where the problem c= an >>>> be, we made one job to index all *.doc*, another one for *.xls*, anoth= er >>>> one for *.pdf ... >>>> Using the help from the list (thanks!) we set the size limit to 100MB >>>> and all jobs succeeds (great) except the one for *.pptx >>>> The message is >>>> Error: Repeated service interruptions - failure processing document: >>>> Read timed out >>>> We don't find any error in the log we have searched: solr.log, ... >>>> Based on some indications found on Internet, we have set the Throttlin= g >>>> max connections setting to 2 (instead of 10) in 3 places: >>>> output connection to SOLR >>>> authority connection to the Active Directory >>>> repository connection to the windows file share >>>> But the problem stays the same. >>>> We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, sam= e >>>> problem. >>>> We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS* >>>> without problem, but the same message comes always for *.PPTX. >>>> The last time the job stops with the message, it displays (not the sam= e >>>> numbers for each run as the windows drive is changing) 56311 documents= , >>>> with 17466 busy and 38847 processed. >>>> As we don't find anything in the log (but probably we don't look at th= e >>>> correct place), we don't know what to do. >>>> Thanks for your help, >>>> Ronny and Fr=E9d=E9ric >>>> >>> >>> >> > --001a113364604bd5e604ea9a8b96 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Karl,
I don't know w= here you live but if you come to Belgium, stop in Brussels for a good Belgi= an beer ;-)
In other words, setting the socket timeout to 2000 ins= tead of 900 has solved the problem.
It has indexed about 160,000 documents in 2 hours.
On t= he other hand, the Manifold/Solr machine (all run in the same Windows VM) h= as been allocated 8 3.6GHZ CPU and 32GB memory, and is used only for the in= dexing test, no search on SOLR.
So the fact that a timeout of 900 seconds was not enough looks strang= e: is it possible that some of these 160,000 docments take more than 15 min= utes to be handled by SOLR?
Ronny&Fr=E9d=E9ric


On Thu, Nov 7, 2013 at 4:30 PM, Karl Wri= ght <daddywri@gmail.com> wrote:
Hi Ronny,

The failure is being caused because = the time spent transferring data to Solr is exceeding the socket timeout yo= u have set for the Solr connection, for some documents.

This i= s probably due to excessive load on the Solr instance.=A0 My suggestion is = to increase the socket timeout on your solr connection to at least 30 minut= es or more to see if this resolves.

Thanks,
Karl



On Thu, Nov 7, = 2013 at 9:30 AM, Ronny Heylen <securaqbereusr@gmail.com> wrote:
Hi,
We have reset thottling to 10 for AD and SOLR (2 for the windows repositor= y).
Job indexing all pptx to null ouput has run successfully (162733 docu= ments)
Job indexing all pptx to solr still fails, manifoldcf.log c= ontains:
=A0WARN 2013-11-07 14:34:06,502 (Worker thread '29') - JCIFS: Possi= bly transient exception detected on attempt 1 while getting share security:= All pipe instances are busy.
jcifs.smb.SmbException: All pipe instances= are busy.
=A0=A0=A0 at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
= =A0=A0=A0 at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
=A0=A0= =A0 at jcifs.smb.SmbSession.send(SmbSession.java:238)
=A0=A0=A0 at jcifs= .smb.SmbTree.send(SmbTree.java:119)
=A0=A0=A0 at jcifs.smb.SmbFile.send(SmbFile.java:775)
=A0=A0=A0 at jcifs= .smb.SmbFile.open0(SmbFile.java:989)
=A0=A0=A0 at jcifs.smb.SmbFile.open= (SmbFile.java:1006)
=A0=A0=A0 at jcifs.smb.SmbFileOutputStream.<init&= gt;(SmbFileOutputStream.java:142)
=A0=A0=A0 at jcifs.smb.TransactNamedPipeOutputStream.<init>(TransactN= amedPipeOutputStream.java:32)
=A0=A0=A0 at jcifs.smb.SmbNamedPipe.getNam= edPipeOutputStream(SmbNamedPipe.java:187)
=A0=A0=A0 at jcifs.dcerpc.Dcer= pcPipeHandle.doSendFragment(DcerpcPipeHandle.java:68)
=A0=A0=A0 at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:190)
= =A0=A0=A0 at jcifs.dcerpc.DcerpcHandle.bind(DcerpcHandle.java:126)
=A0= =A0=A0 at jcifs.dcerpc.DcerpcHandle.sendrecv(DcerpcHandle.java:140)
=A0= =A0=A0 at jcifs.smb.SmbFile.getShareSecurity(SmbFile.java:2943)
=A0=A0=A0 at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriv= eConnector.getFileShareSecurity(SharedDriveConnector.java:2393)
=A0=A0= =A0 at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConne= ctor.describeDocumentSecurity(SharedDriveConnector.java:1045)
=A0=A0=A0 at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriv= eConnector.getDocumentVersions(SharedDriveConnector.java:554)
=A0=A0=A0 = at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:= 322)
=A0WARN 2013-11-07 14:55:45,257 (Worker thread '30') - IO exception= during indexing: Read timed out
java.net.SocketTimeoutException: Read t= imed out
=A0=A0=A0 at java.net.SocketInputStream.socketRead0(Native Meth= od)
=A0=A0=A0 at java.net.SocketInputStream.read(SocketInputStream.java:152)=A0=A0=A0 at java.net.SocketInputStream.read(SocketInputStream.java:122)=A0=A0=A0 at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffe= r(AbstractSessionInputBuffer.java:166)
=A0=A0=A0 at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInp= utBuffer.java:90)
=A0=A0=A0 at org.apache.http.impl.io.AbstractSessionIn= putBuffer.readLine(AbstractSessionInputBuffer.java:281)
=A0=A0=A0 at org= .apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRespo= nseParser.java:92)
=A0=A0=A0 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(= DefaultHttpResponseParser.java:62)
=A0=A0=A0 at org.apache.http.impl.io.= AbstractMessageParser.parse(AbstractMessageParser.java:254)
=A0=A0=A0 at= org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Ab= stractHttpClientConnection.java:289)
=A0=A0=A0 at org.apache.http.impl.conn.DefaultClientConnection.receiveRespo= nseHeader(DefaultClientConnection.java:252)
=A0=A0=A0 at org.apache.http= .impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientC= onnectionImpl.java:191)
=A0=A0=A0 at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse= (HttpRequestExecutor.java:300)
=A0=A0=A0 at org.apache.http.protocol.Htt= pRequestExecutor.execute(HttpRequestExecutor.java:127)
=A0=A0=A0 at org.= apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDir= ector.java:715)
=A0=A0=A0 at org.apache.http.impl.client.DefaultRequestDirector.execute(Def= aultRequestDirector.java:520)
=A0=A0=A0 at org.apache.http.impl.client.A= bstractHttpClient.execute(AbstractHttpClient.java:906)
=A0=A0=A0 at org.= apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:= 805)
=A0=A0=A0 at org.apache.http.impl.client.AbstractHttpClient.execute(Abstrac= tHttpClient.java:784)
=A0=A0=A0 at org.apache.manifoldcf.agents.output.s= olr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
=A0= =A0=A0 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolr= Server.java:180)
=A0=A0=A0 at org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro= cess(AbstractUpdateRequest.java:117)
=A0=A0=A0 at org.apache.manifoldcf.= agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
=A0W= ARN 2013-11-07 14:55:45,273 (Worker thread '30') - Service interrup= tion reported for job 1383765534700 connection 'Filesharesrv1': IO = exception during indexing: Read timed out
ERROR 2013-11-07 14:55:45,304 (Worker thread '30') - Exception toss= ed: Repeated service interruptions - failure processing document: Read time= d out
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeate= d service interruptions - failure processing document: Read timed out
=A0=A0=A0 at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerTh= read.java:586)
Caused by: java.net.SocketTimeoutException: Read timed ou= t
=A0=A0=A0 at java.net.SocketInputStream.socketRead0(Native Method)
= =A0=A0=A0 at java.net.SocketInputStream.read(SocketInputStream.java:152) =A0=A0=A0 at java.net.SocketInputStream.read(SocketInputStream.java:122)=A0=A0=A0 at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer= (AbstractSessionInputBuffer.java:166)
=A0=A0=A0 at org.apache.http.impl.= io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
=A0=A0=A0 at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(Ab= stractSessionInputBuffer.java:281)
=A0=A0=A0 at org.apache.http.impl.con= n.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)=A0=A0=A0 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead= (DefaultHttpResponseParser.java:62)
=A0=A0=A0 at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMe= ssageParser.java:254)
=A0=A0=A0 at org.apache.http.impl.AbstractHttpClie= ntConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)=A0=A0=A0 at org.apache.http.impl.conn.DefaultClientConnection.receiveRes= ponseHeader(DefaultClientConnection.java:252)
=A0=A0=A0 at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveR= esponseHeader(ManagedClientConnectionImpl.java:191)
=A0=A0=A0 at org.apa= che.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor= .java:300)
=A0=A0=A0 at org.apache.http.protocol.HttpRequestExecutor.execute(HttpReque= stExecutor.java:127)
=A0=A0=A0 at org.apache.http.impl.client.DefaultReq= uestDirector.tryExecute(DefaultRequestDirector.java:715)
=A0=A0=A0 at or= g.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDire= ctor.java:520)
=A0=A0=A0 at org.apache.http.impl.client.AbstractHttpClient.execute(Abstrac= tHttpClient.java:906)
=A0=A0=A0 at org.apache.http.impl.client.AbstractH= ttpClient.execute(AbstractHttpClient.java:805)
=A0=A0=A0 at org.apache.h= ttp.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
=A0=A0=A0 at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServe= r.request(ModifiedHttpSolrServer.java:291)
=A0=A0=A0 at org.apache.solr.= client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
=A0=A0= =A0 at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(A= bstractUpdateRequest.java:117)
=A0=A0=A0 at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThre= ad.run(HttpPoster.java:919)
=A0WARN 2013-11-07 15:06:04,235 (Worker thre= ad '9') - IO exception during indexing: Read timed out
java.net.= SocketTimeoutException: Read timed out
=A0=A0=A0 at java.net.SocketInputStream.socketRead0(Native Method)
=A0= =A0=A0 at java.net.SocketInputStream.read(SocketInputStream.java:152)
= =A0=A0=A0 at java.net.SocketInputStream.read(SocketInputStream.java:122)=A0=A0=A0 at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer= (AbstractSessionInputBuffer.java:166)
=A0=A0=A0 at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInp= utBuffer.java:90)
=A0=A0=A0 at org.apache.http.impl.io.AbstractSessionIn= putBuffer.readLine(AbstractSessionInputBuffer.java:281)
=A0=A0=A0 at org= .apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpRespo= nseParser.java:92)
=A0=A0=A0 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(= DefaultHttpResponseParser.java:62)
=A0=A0=A0 at org.apache.http.impl.io.= AbstractMessageParser.parse(AbstractMessageParser.java:254)
=A0=A0=A0 at= org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(Ab= stractHttpClientConnection.java:289)
=A0=A0=A0 at org.apache.http.impl.conn.DefaultClientConnection.receiveRespo= nseHeader(DefaultClientConnection.java:252)
=A0=A0=A0 at org.apache.http= .impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientC= onnectionImpl.java:191)
=A0=A0=A0 at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse= (HttpRequestExecutor.java:300)
=A0=A0=A0 at org.apache.http.protocol.Htt= pRequestExecutor.execute(HttpRequestExecutor.java:127)
=A0=A0=A0 at org.= apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDir= ector.java:715)
=A0=A0=A0 at org.apache.http.impl.client.DefaultRequestDirector.execute(Def= aultRequestDirector.java:520)
=A0=A0=A0 at org.apache.http.impl.client.A= bstractHttpClient.execute(AbstractHttpClient.java:906)
=A0=A0=A0 at org.= apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:= 805)
=A0=A0=A0 at org.apache.http.impl.client.AbstractHttpClient.execute(Abstrac= tHttpClient.java:784)
=A0=A0=A0 at org.apache.manifoldcf.agents.output.s= olr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
=A0= =A0=A0 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolr= Server.java:180)
=A0=A0=A0 at org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro= cess(AbstractUpdateRequest.java:117)
=A0=A0=A0 at org.apache.manifoldcf.= agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
=A0W= ARN 2013-11-07 15:06:04,235 (Worker thread '9') - Service interrupt= ion reported for job 1383765534700 connection 'Filesharesrv1': IO e= xception during indexing: Read timed out



On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright <daddywri@gmail.com>= wrote:
H= i Ronny,

One minor thing: you should need to set throttling to= 2 ONLY for the Windows repository connection, not for AD or Solr.


As for how to debug this issue, first off you should be = looking in the manifoldcf.log file (or the equivalent).=A0 You should see W= ARN messages from the shared file connector under most conditions when ther= e's a service interruption.=A0 You would probably see "Read timed = out" warnings if you looked there, since that is what aborted the job = run, along with a stack trace.=A0 However, that's not going to add much= information to the analysis at this point.

What might be valuable is to determine whether the problem i= s happening on the Windows side or on the Solr side.=A0 At this point I can= 't tell.=A0 You could, however, create a null output connection, and cr= eate=A0 a similar job the sends its output there, and see if it completes.= =A0 Can you do this and get back to me?

Thanks,
Karl





On Wed, Nov 6, 2013 = at 3:17 PM, Ronny Heylen <securaqbereusr@gmail.com> w= rote:
Hi,
We use Manifoldcf 1.3 an= d Solr 4.4 to index a shared network drive with several hundred thousands d= ocuments.
Doing only one manifoldcf job to index all the drive was always = giving some kind of error, therefore to better understand where the problem= can be, we made one job to index all *.doc*, another one for *.xls*, anoth= er one for *.pdf ...
Using the help from the list (thanks!) we set the size limit to= 100MB and all jobs succeeds (great) except the one for *.pptx
The= message is
Error: Repeated service interruptions - failure processing = document: Read timed out
We don't find any error in the log we have searched: solr.log, ..= .
Based on some indications found on Internet, we have set the Thr= ottling max connections setting to 2 (instead of 10) in 3 places:
output= connection to SOLR
authority connection to the Active Directory
repository conn= ection to the windows file share
But the problem stays the same.
We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, = same problem.
We can let run the job for all *.PDF, or all *.DOC*, or all *.XL= S* without problem, but the same message comes always for *.PPTX.
=
The last time the job stops with the message, it displays (not the sam= e numbers for each run as the windows drive is changing) 56311 documents, w= ith 17466 busy and 38847 processed.
As we don't find anything in the log (but probably we don't l= ook at the correct place), we don't know what to do.
Thanks fo= r your help,
Ronny and Fr=E9d=E9ric




--001a113364604bd5e604ea9a8b96--