manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Steenbeke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1562) Document removal Elastic
Date Thu, 06 Dec 2018 11:14:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711294#comment-16711294
] 

Tim Steenbeke commented on CONNECTORS-1562:
-------------------------------------------

There are no deletions logged, that's what my issue is all about.

I test with a seed map of 30 links and than with a seed map of 20 links.
first of the scheduler doesn't start the job even though the time passes for it to start (and
it is configured to start on the scheduled time) .
second if i manually start the job it doesn't delete any of the documents.


I run these short cycles because I'm testing why it doesn't work.
The full cycle will be done every 3 hours but i can't be developing and waiting 3houres every
time.

> Document removal Elastic
> ------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>         Attachments: Screenshot from 2018-12-05 09-01-46.png
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to keep it runneng
even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message