manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend Garåsen <e.f.gara...@usit.uio.no>
Subject Re: IO exception during indexing: null
Date Thu, 07 Mar 2013 13:12:42 GMT

Thanks, Karl!

I will first try to set the exact authentication restrictions we have on 
our prod server on our test server. If I get the same errors on our test 
server after I have changed the security settings, we may exclude some 
other possibilities.

Then it might be a good idea to turn off the retries. I have played 
around with HttpClient before and enabled this, so I think I know how to 
proceed. I will notify you.

Erlend

On 07.03.13 14.00, Karl Wright wrote:
> FWIW, to clarify, I think you are going to be best served by trying to
> first turn off the retries (however that can be done, since the
> current code is apparently insufficient), and then posting what the
> real underlying problem seems to be.  Alternatively, it is possible
> that there's already another exception dumped into the log that you
> didn't include which would be helpful.  If you need to figure out why
> the retries are still happening you may wind up needing to build the
> httpclient jar yourself, after adding appropriate diagnostics around
> the retry logic.  I'd be happy to work with you on this but probably
> not until this evening Boston time.
>
> Karl
>
> On Thu, Mar 7, 2013 at 7:43 AM, Karl Wright <daddywri@gmail.com> wrote:
>> Hi Erlend,
>>
>> What is happening is the following.
>>
>> (1) Your indexing is failing
>> (2) Httpclient by default retries 3 times on failure
>> (3) Between each retry, it resets the input stream, but this is not a
>> resettable input stream, so that can't work..
>>
>> Because of (3), the Solr Connector explicitly disables retries, using this code:
>>
>>      // No retries
>>      localClient.setHttpRequestRetryHandler(new HttpRequestRetryHandler()
>>        {
>>          public boolean retryRequest(
>>            IOException exception,
>>            int executionCount,
>>            HttpContext context)
>>          {
>>            return false;
>>          }
>>
>>        });
>>
>>
>> I don't know why that isn't working - it certainly used to.  Perhaps
>> you could research it.
>>
>> Fundamentally, though, you have a problem upstream of that - you need
>> to figure out why the indexing request is failing in the first place.
>> It's likely to be a socket timeout or connection timeout underneath it
>> all.
>>
>> Karl
>>
>> On Thu, Mar 7, 2013 at 7:34 AM, Erlend Garåsen <e.f.garasen@usit.uio.no> wrote:
>>>
>>> Hello list,
>>>
>>> I'm getting the following error when the web cralwer is trying to post
>>> documents to Solr 4: IO exception during indexing: null. This happens for
>>> all indexing attempts and just ends in the following:
>>>
>>> --8<--
>>>   WARN 2013-03-01 19:59:51,360 (Worker thread '0') - Service interruption
>>> reported for job 1362070726596 connection 'Web crawler': IO exception during
>>> indexing: null
>>> ERROR 2013-03-01 19:59:51,378 (Worker thread '0') - Exception tossed:
>>> Repeated service interruptions - failure processing document: null
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service
>>> interruptions - failure processing document: null
>>>          at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:604)
>>> Caused by: org.apache.http.client.ClientProtocolException
>>>          at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>>          at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>>          at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>>          at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
>>>          at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>>          at
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>          at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:833)
>>> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
>>> retry request with a non-repeatable request entity.
>>>          at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:695)
>>>          at
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
>>> --8<--
>>>
>>> I'm running version 1.1.1 of MCF deployed on Resin. This does not happen on
>>> our test server which is equally configured as our prod server, except for
>>> some security restrictions. Basic auth is configured for both reading and
>>> writing on the Solr server.
>>>
>>> I *did* got the same error the first time I deployed version 1.1.1 of MCF on
>>> our test server, but it went away after I added the Solr core name in the
>>> core/collection name field. On our production server I *do* have the core
>>> named configured, so now I need help in order to figure out what's going on.
>>>
>>> The NonRepeatableRequestException is perhaps caused by a misconfiguration of
>>> HttpClient 4, but I'm not sure this is the root of the problem I'm facing
>>> here. It might be due to the basic auth restriction  which is configured.
>>> Anyway, this was not a problem for previous versions of MCF.
>>>
>>> Erlend
>>>
>>> --
>>> Erlend Garåsen
>>> Center for Information Technology Services
>>> University of Oslo
>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Mime
View raw message