manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Steenbeke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
Date Mon, 31 Dec 2018 13:14:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731315#comment-16731315
] 

Tim Steenbeke commented on CONNECTORS-1562:
-------------------------------------------

IS this the error ?
{code:java}
 WARN 2018-12-31T08:24:46,453 (Worker thread '32') - Service interruption reported for job
1546241012417 connection 'repo_website-en': IO exception: Stream Closed
 WARN 2018-12-31T08:28:52,471 (Worker thread '6') - Service interruption reported for job
1546241012417 connection 'repo_website-en': IO exception: Stream Closed
 WARN 2018-12-31T08:32:10,699 (Worker thread '13') - Service interruption reported for job
1546241012417 connection 'repo_website-en': IO exception: Stream Closed
ERROR 2018-12-31T08:32:10,750 (Worker thread '13') - Exception tossed: Repeated service interruptions
- failure processing document: Stream Closed
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions
- failure processing document: Stream Closed
        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:489)
[mcf-pull-agent.jar:?]
Caused by: java.io.IOException: Stream Closed
        at java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191]
        at java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191]
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_191]
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_191]
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191]
        at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_191]
        at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221)
~[?:?]
        at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
~[httpcore-4.4.10.jar:4.4.10]
        at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
~[httpcore-4.4.10.jar:4.4.10]
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
~[httpcore-4.4.10.jar:4.4.10]
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
~[httpclient-4.5.6.jar:4.5.6]
        at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection$CallThread.run(ElasticSearchConnection.java:133)
~[?:?]
 WARN 2018-12-31T08:33:35,958 (Job notification thread) - ES: Commit failed: {"error":"Incorrect
HTTP method for uri [/website-en/_optimize] and method [GET], allowed: [POST]","status":405}
 WARN 2018-12-31T08:34:46,024 (Job notification thread) - ES: Commit failed: {"error":"Incorrect
HTTP method for uri [/pintra/_optimize] and method [GET], allowed: [POST]","status":405}
{code}
The time is 1h difference, it's running on a docker container that has different timezone
atm.

> Documents unreachable due to hopcount are not considered unreachable on cleanup pass
> ------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>             Fix For: ManifoldCF 2.12
>
>         Attachments: Screenshot from 2018-12-31 11-17-29.png, manifoldcf.log.cleanup,
manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to keep it runneng
even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message