manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Steenbeke (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1562) Document removal Elastic
Date Mon, 10 Dec 2018 08:56:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714444#comment-16714444
] 

Tim Steenbeke edited comment on CONNECTORS-1562 at 12/10/18 8:55 AM:
---------------------------------------------------------------------

Hi [~kwright@metacarta.com], So i Set up a Job as you explained above.
 The scheduler worked fine now, even with multiple values.
 I tested the same with the ES output connector and It also started up at the scheduled time.
 So it seems there was an issue in the import of the job schedule which has been resolved
now.

Next I edited the seeds and deleted some links and let the job run scheduled again.
 There were 0 Deletions and the Simple History also showed 0 deletion messages.
Also in the Document Status for the Jobs there were no deletions registered.
 (also on the null output but this is probably normal cause it's Null)


was (Author: steenti):
Hi [~kwright@metacarta.com], So i Set up a Job as you explained above.
The scheduler worked fine now, even with multiple values.
I tested the same with the ES output connector and It also started up at the scheduled time.
So it seems there was an issue in the import of the job schedule which has been resolved now.

Next I edited the seeds and deleted some links and let the job run scheduled again.
There were 0 Deletions and the Simple History also showed 0 deletion messages.
(also on the null output but this is probably normal cause it's Null)

> Document removal Elastic
> ------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>         Attachments: Screenshot from 2018-12-05 09-01-46.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to keep it runneng
even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message