manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: [VOTE] Release Apache ManifoldCF 1.7.1, RC1
Date Thu, 18 Sep 2014 11:43:59 GMT
Hi Erlend,

The "Interrupted: null" message with a -104 code means only that the fetch
was interrupted by something.  Unfortunately, the message is not clear
about what the cause of the interruption is.  This is unrelated to
Zookeeper; but I agree that it is suspicious that many such interruptions
appear right after robots is parsed.

One cause of a -104 is when the target server forcibly drops the
connection, so an InterruptedIOException is thrown.  Having a look at the
timestamps for the fetch messages, it looks believable that you might have
exceeded some predetermined limit on that machine.  They're all within a
few milliseconds of each other.  When a robots file needs to be read,
ManifoldCF creates an event for that, and the urls blocked by that event
will all be 'fetchable' as soon as the event is released.  Perhaps your
throttling needs to be adjusted now that the rate limit bug has been fixed?

I won't be able to work with this without at least your crawling parameters
for the server in question.  I can ping that server so if you would like I
can try crawling that server from here.

For zookeeper, I would still try to either increase your tick count to
maybe 10000, or better yet, find out why you periodically lose the ability
to transmit pings from MCF to your zookeeper process.

Thanks,
Karl




On Thu, Sep 18, 2014 at 7:15 AM, Erlend GarĂ¥sen <e.f.garasen@usit.uio.no>
wrote:

> On 18.09.14 13:00, Karl Wright wrote:
>
>> Hi Erlend,
>>
>> please can you also add the manifoldcf log as well?
>>
>
> Yes, I will, but it includes entries from RC0 as well.
>
> MCF works perfectly using the other jobs for the other hosts. Take a look
> at the following once again. MCF is being interrupted:
> INFO 2014-09-18 11:13:42,824 (Worker thread '19') - WEB: FETCH URL|
> https://www.duo.uio.no/|1411030940209+682605|-104|
> 4096|org.apache.manifoldcf.core.interfaces.ManifoldCFException|
> <https://www.duo.uio.no/%7C1411030940209+682605%7C-104%7C4096%7Corg.apache.manifoldcf.core.interfaces.ManifoldCFException%7C>
> Interrupted: Interrupted: null
>
> You can find this entry near the other regarding the robots.txt file:
> http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log
>
> Erlend
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message