manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
Date Mon, 31 Dec 2018 12:07:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731290#comment-16731290
] 

Karl Wright edited comment on CONNECTORS-1562 at 12/31/18 12:06 PM:
--------------------------------------------------------------------

{code}
Error: Repeated service interruptions - failure processing document: Stream Closed
{code}

This is not a crash; this just means that the job aborts.  It also comes with numerous stack
traces, in the manifoldcf log, one for each time the document retries.  That stack trace would
be very helpful to have.  Thanks!



was (Author: kwright@metacarta.com):
{code}
Error: Repeated service interruptions - failure processing document: Stream Closed
{code}

This is not a crash; this just means that the job aborts.  It also comes with numerous stack
traces, one for each time the document retries.  That stack trace would be very helpful to
have.  Thanks!


> Documents unreachable due to hopcount are not considered unreachable on cleanup pass
> ------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>             Fix For: ManifoldCF 2.12
>
>         Attachments: Screenshot from 2018-12-31 11-17-29.png, manifoldcf.log.cleanup,
manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to keep it runneng
even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message