manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shinichiro Abe <shinichiro.ab...@gmail.com>
Subject Re: Repeated service interruptions
Date Wed, 05 Sep 2012 09:17:38 GMT
Hi Shigeki-san,

I do not know the cause, but I looked at the log of Solr, 
there were some exceptions that were raised by indexing certain files.
And I excluded from indexing these files, 
as a result I could crawl successfully.
If you check Solr's log out then you may find something like this.

Regards,
Shinichiro Abe

On 2012/09/05, at 14:37, Shigeki Kobayashi wrote:

> Hi Abe-san
> 
> I've just faced the same thing as you did, and now having a trouble in figuring out how
to solve this problem. 
> 
> Did you figure out how to get ride of this problem? If so, it would be nice if you could
share how you did it.
> 
> 
> Regards,
> 
> Shigeki
> 
> 2012/8/2 Shinichiro Abe <shinichiro.abe.1@gmail.com>
> Thanks very much for the help!
> I understand.
> Shinichiro Abe
> 
> On 2012/08/01, at 19:35, Karl Wright wrote:
> 
> > On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe
> > <shinichiro.abe.1@gmail.com> wrote:
> >> Hi Karl,
> >>
> >> I still have a problem.
> >> I reduced maximum number of connections into 2.
> >> I rebooted the file server, not domain controller.
> >> When I configured the paths[1], the log said no error
> >> and ShareDrive connector crawled the files successfully.
> >> When I made the path's config default(matching * ),
> >> the log said "all pipe instances are busy" error.
> >> Both of path's config pointed the same location.
> >>
> >> Also when this error occurred, watching the log of ingest,
> >> HttpPoster was waiting for response stream
> >> and couldn't get response from Solr,
> >> and threw SocketTimeoutException.
> >> I increased jcifs.smb.client.responseTimeout
> >> but still threw the exception.
> >> On Solr, Jetty threw SocketException(socket wr
> >> ite error).
> >> I'm working on checking Solr logs.
> >> Solr may do something wrong when running /update/extract.
> >>
> >
> > If Solr threw the exception this sounds likely.
> >
> >> Do you know something like this?
> >> Does path's matching config affect those errors?
> >>
> >> [1]Paths Tab:
> >> Include  directory(s)  matching  /01*
> >>
> >
> > This should have nothing to do with socket exceptions, except possibly
> > that the crawler winds up trying to read a file that isn't actually a
> > file but is something else, like a named pipe or something.  This
> > typically doesn't happen if the server is a Windows machine but if it
> > is a Samba server I could imagine something like that happening.
> >
> > Karl
> >
> >> P.S.
> >> Thank you for fix CONNECTORS-494.
> >> I checked trunk code, worked well.
> >>
> >> Thank you,
> >> Shinichiro Abe
> >>
> >> On 2012/07/24, at 22:13, Karl Wright wrote:
> >>
> >>> Hi Abe-san,
> >>>
> >>> Did you figure out what the problem was?
> >>>
> >>> Karl
> >>>
> >>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <daddywri@gmail.com>
wrote:
> >>>> Hi Abe-san,
> >>>>
> >>>> Sometimes what looks like a server error can actually be due to the
> >>>> domain controller.  I wonder if the domain controller needs to be
> >>>> rebooted?
> >>>>
> >>>> Karl
> >>>>
> >>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
> >>>> <shinichiro.abe.1@gmail.com> wrote:
> >>>>> Hi Karl,
> >>>>> Thank you for the reply.
> >>>>> I tried to reduce maximum number of connections from 10
> >>>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
> >>>>> Thank you.
> >>>>> Shinichiro Abe
> >>>>>
> >>>>> On 2012/07/19, at 15:55, Karl Wright wrote:
> >>>>>
> >>>>>> Hi Abe-san,
> >>>>>>
> >>>>>> The "all pipe instances are busy" error is coming from the Windows
> >>>>>> server you are trying to crawl.  I don't know what is happening
there
> >>>>>> but here are some possibilities:
> >>>>>>
> >>>>>> (1) The Windows server is just overloaded; you can try reducing
the
> >>>>>> maximum number of connections to 2 or 3 to see if that helps.
> >>>>>> (2) The Windows server needs rebooting.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Karl
> >>>>>>
> >>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
> >>>>>> <shinichiro.abe.1@gmail.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I use windows shares connector and ran a job.
> >>>>>>> The job was aborted without done normally and the job's
status said:
> >>>>>>> Error: Repeated service interruptions - failure processing
document: Read timed out
> >>>>>>>
> >>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the
latest version's jcifs.jar.
> >>>>>>> Is the crawled server busy? I think the server MCF is installed
seems not to be busy,
> >>>>>>> the other servers in which MCF will crawls seem to be busy.
> >>>>>>> How can I run the job without error? What's wrong?
> >>>>>>>
> >>>>>>>
> >>>>>>> the logs of connector:
> >>>>>>>
> >>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS:
Possibly transient exception detected on attempt 1 while getting share security: All pipe
instances are busy.
> >>>>>>>      at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
> >>>>>>>      at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS:
Possibly transient exception detected on attempt 3 while getting share security: All pipe
instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS:
'Busy' response when getting document version for smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf:
retrying...
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest
service interruption reported for job 1342076182624 connection 'Windows shares': Timeout or
other service interruption: All pipe instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service
interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket
timeout exception waiting for response code: Read timed out; ingestion will be retried again
later
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service
interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket
timeout exception waiting for response code: Read timed out; ingestion will be retried again
later
> >>>>>>> ..
> >>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception
tossed: Repeated service interruptions - failure processing document: Read timed out
> >>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
Repeated service interruptions - failure processing document: Read timed out
> >>>>>>>      at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
> >>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
> >>>>>>>      at java.net.SocketInputStream.socketRead0(Native Method)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
> >>>>>>>      at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>> Shinichiro Abe
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> 
> 


Mime
View raw message