manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arcadius Ahouansou (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CONNECTORS-1113) Web connection being dropped while still in use?
Date Mon, 24 Nov 2014 02:38:12 GMT
Arcadius Ahouansou created CONNECTORS-1113:
----------------------------------------------

             Summary: Web connection being dropped while still in use?
                 Key: CONNECTORS-1113
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1113
             Project: ManifoldCF
          Issue Type: Bug
          Components: Web connector
    Affects Versions: ManifoldCF 1.7.2
            Reporter: Arcadius Ahouansou


Hello.
I am using ManifoldCF web crawler for crawling a web site and index into Solr.

I have noticed that for most websites everything is OK.
However, for some, Manifold is unable to crawl i.e nothing pushed to Solr and the log shows
entries like 
*Cancelling request execution*

Please, see below for more detail.
At this point, I am not very sure what is causing this. It may have to do with the Gzip or
the Keep-Alive header sent by the server?

{code}

DEBUG org.apache.http.client.protocol.RequestAddCookies.process(RequestAddCookies.java:122)
2014-11-24 02:15:51,710 (Thread-5783) - CookieSpec selected: compatibility
DEBUG org.apache.http.client.protocol.RequestAuthCache.process(RequestAuthCache.java:75) 2014-11-24
02:15:51,712 (Thread-5783) - Auth cache not set in the context
DEBUG org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:217) 2014-11-24
02:15:51,714 (Thread-5783) - Opening connection {}->http://mysite.co.uk:80
DEBUG org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:120)
2014-11-24 02:15:51,746 (Thread-5783) - Connecting to mysite.co.uk/11.11.11.11:80
DEBUG org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:127)
2014-11-24 02:15:51,762 (Thread-5783) - Connection established 192.168.1.5:42919<->11.11.11.11:80
DEBUG org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:238) 2014-11-24
02:15:51,763 (Thread-5783) - Executing request GET /hot/search/ HTTP/1.1
DEBUG org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:243) 2014-11-24
02:15:51,763 (Thread-5783) - Target auth state: UNCHALLENGED
DEBUG org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:249) 2014-11-24
02:15:51,764 (Thread-5783) - Proxy auth state: UNCHALLENGED
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:124)
2014-11-24 02:15:51,764 (Thread-5783) - http-outgoing-1 >> GET /hot/search/ HTTP/1.1
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:127)
2014-11-24 02:15:51,765 (Thread-5783) - http-outgoing-1 >> User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler;
webbot@crawler.net)
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:127)
2014-11-24 02:15:51,765 (Thread-5783) - http-outgoing-1 >> From: webbot@crawler.net
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:127)
2014-11-24 02:15:51,765 (Thread-5783) - http-outgoing-1 >> Accept: */*
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:127)
2014-11-24 02:15:51,766 (Thread-5783) - http-outgoing-1 >> Accept-Encoding: gzip,deflate
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:127)
2014-11-24 02:15:51,766 (Thread-5783) - http-outgoing-1 >> Host: mysite.co.uk:80
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onRequestSubmitted(LoggingManagedHttpClientConnection.java:127)
2014-11-24 02:15:51,766 (Thread-5783) - http-outgoing-1 >> Connection: Keep-Alive
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,766 (Thread-5783)
- http-outgoing-1 >> "GET /hot/search/ HTTP/1.1[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,767 (Thread-5783)
- http-outgoing-1 >> "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler; webbot@crawler.net)[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,768 (Thread-5783)
- http-outgoing-1 >> "From: webbot@crawler.net[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,769 (Thread-5783)
- http-outgoing-1 >> "Accept: */*[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,769 (Thread-5783)
- http-outgoing-1 >> "Accept-Encoding: gzip,deflate[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,769 (Thread-5783)
- http-outgoing-1 >> "Host: mysite.co.uk:80[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,769 (Thread-5783)
- http-outgoing-1 >> "Connection: Keep-Alive[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,769 (Thread-5783)
- http-outgoing-1 >> "[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,841 (Thread-5783)
- http-outgoing-1 << "HTTP/1.1 200 OK[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,842 (Thread-5783)
- http-outgoing-1 << "Date: Mon, 24 Nov 2014 02:17:06 GMT[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,842 (Thread-5783)
- http-outgoing-1 << "Server: Apache[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,843 (Thread-5783)
- http-outgoing-1 << "Set-Cookie: ci_session=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2248df265e57a5bc5b7ded4175ef109fe0%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A12%3A%2210.190.254.5%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A59%3A%22Mozilla%2F5.0+%28ApacheManifoldCFWebCrawler%3B+webbot%40crawler.net%29%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1416795426%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7D1dec34150fe1ab15f341d355f6ebd0dc;
expires=Wed, 23-Nov-2016 02:17:06 GMT; path=/[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,843 (Thread-5783)
- http-outgoing-1 << "Set-Cookie: ci_session=a%3A6%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2248df265e57a5bc5b7ded4175ef109fe0%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A12%3A%2210.190.254.5%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A59%3A%22Mozilla%2F5.0+%28ApacheManifoldCFWebCrawler%3B+webbot%40crawler.net%29%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1416795426%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3Bs%3A4%3A%22lang%22%3BN%3B%7Df6625848d5ca7bf8d5db71617607bada;
expires=Wed, 23-Nov-2016 02:17:06 GMT; path=/[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,843 (Thread-5783)
- http-outgoing-1 << "Vary: Accept-Encoding[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,844 (Thread-5783)
- http-outgoing-1 << "Content-Encoding: gzip[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,844 (Thread-5783)
- http-outgoing-1 << "Content-Length: 20[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,844 (Thread-5783)
- http-outgoing-1 << "Keep-Alive: timeout=5, max=99[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,844 (Thread-5783)
- http-outgoing-1 << "Connection: Keep-Alive[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,847 (Thread-5783)
- http-outgoing-1 << "Content-Type: text/html[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:72) 2014-11-24 02:15:51,847 (Thread-5783)
- http-outgoing-1 << "[\r][\n]"
DEBUG org.apache.http.impl.conn.Wire.wire(Wire.java:86) 2014-11-24 02:15:51,848 (Thread-5783)
- http-outgoing-1 << "[0x1f][0x8b][0x8][0x0][0x0][0x0][0x0][0x0][0x0][0x3][0x3][0x0][0x0][0x0][0x0][0x0][0x0][0x0][0x0][0x0]"
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:113)
2014-11-24 02:15:51,849 (Thread-5783) - http-outgoing-1 << HTTP/1.1 200 OK
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,849 (Thread-5783) - http-outgoing-1 << Date: Mon, 24 Nov 2014 02:17:06
GMT
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,849 (Thread-5783) - http-outgoing-1 << Server: Apache
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,850 (Thread-5783) - http-outgoing-1 << Set-Cookie: ci_session=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2248df265e57a5bc5b7ded4175ef109fe0%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A12%3A%2210.190.254.5%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A59%3A%22Mozilla%2F5.0+%28ApacheManifoldCFWebCrawler%3B+webbot%40crawler.net%29%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1416795426%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7D1dec34150fe1ab15f341d355f6ebd0dc;
expires=Wed, 23-Nov-2016 02:17:06 GMT; path=/
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,850 (Thread-5783) - http-outgoing-1 << Set-Cookie: ci_session=a%3A6%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2248df265e57a5bc5b7ded4175ef109fe0%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A12%3A%2210.190.254.5%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A59%3A%22Mozilla%2F5.0+%28ApacheManifoldCFWebCrawler%3B+webbot%40crawler.net%29%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1416795426%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3Bs%3A4%3A%22lang%22%3BN%3B%7Df6625848d5ca7bf8d5db71617607bada;
expires=Wed, 23-Nov-2016 02:17:06 GMT; path=/
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,850 (Thread-5783) - http-outgoing-1 << Vary: Accept-Encoding
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,851 (Thread-5783) - http-outgoing-1 << Content-Encoding: gzip
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,851 (Thread-5783) - http-outgoing-1 << Content-Length: 20
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,851 (Thread-5783) - http-outgoing-1 << Keep-Alive: timeout=5, max=99
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,852 (Thread-5783) - http-outgoing-1 << Connection: Keep-Alive
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.onResponseReceived(LoggingManagedHttpClientConnection.java:116)
2014-11-24 02:15:51,852 (Thread-5783) - http-outgoing-1 << Content-Type: text/html
DEBUG org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:267) 2014-11-24
02:15:51,853 (Thread-5783) - Connection can be kept alive for 5000 MILLISECONDS
DEBUG org.apache.http.client.protocol.ResponseProcessCookies.processCookies(ResponseProcessCookies.java:117)
2014-11-24 02:15:51,856 (Thread-5783) - Cookie accepted [ci_session="a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2248df265e57a5bc5b7ded4175ef109fe0%22%3Bs%3A10%3A%2...",
version:0, domain:mysite.co.uk, path:/, expiry:Wed Nov 23 02:17:06 GMT 2016]
DEBUG org.apache.http.client.protocol.ResponseProcessCookies.processCookies(ResponseProcessCookies.java:117)
2014-11-24 02:15:51,860 (Thread-5783) - Cookie accepted [ci_session="a%3A6%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2248df265e57a5bc5b7ded4175ef109fe0%22%3Bs%3A10%3A%2...",
version:0, domain:mysite.co.uk, path:/, expiry:Wed Nov 23 02:17:06 GMT 2016]
DEBUG org.apache.http.impl.execchain.ConnectionHolder.cancel(ConnectionHolder.java:140) 2014-11-24
02:15:51,866 (Thread-5783) - Cancelling request execution
DEBUG org.apache.http.impl.conn.CPoolEntry.isExpired(CPoolEntry.java:81) 2014-11-24 02:15:57,017
(Idle cleanup thread) - Connection [id:1][route:{}->http://mysite.co.uk:80][state:null]
expired @ Mon Nov 24 02:15:56 GMT 2014
DEBUG org.apache.http.impl.conn.LoggingManagedHttpClientConnection.close(LoggingManagedHttpClientConnection.java:79)
2014-11-24 02:15:57,019 (Idle cleanup thread) - http-outgoing-1: Close connection


{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message