manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE
Date Mon, 26 Feb 2018 18:29:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377324#comment-16377324
] 

Karl Wright commented on CONNECTORS-1497:
-----------------------------------------

I think it's actually OK to just update the desired execute time in all situations where reseeding
happens.

The reason we haven't done that before is because the same mechanism is used for error retries.
 This change would mean that any waiting in process for an error would be cancelled as well.
 That's probably unusual enough that it would be OK though.


> Re-index seeded modified documents when the re-crawl interval is infinity and   connector
model is MODEL_ADD_CHANGE
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1497
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1497
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 2.9.1
>            Reporter: Ahmed Mahfouz
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: CONNECTORS-1497.patch
>
>
> Trying to avoid a full scan of all documents for a better efficiency with a large number
of documents. I tried so many different setting for the Jobs but I couldn't accomplish that.
Especially when the repository connector model is MODEL_ADD_CHANGE I was expecting the modified
documents seeded should be re-indexed immediately similar to the new seeds but I found out
it uses the re-crawl time as the scheduled time and it waits for the full scan to get re-indexed.
I avoided full scan by setting the re-crawl interval to infinity but still, my modified documents
seeds were not getting indexed. After digging into the code for quite good time. I did some
modification to the JobManager and it worked for me. I would like to share the change with
you for review so I opened this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message