manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Mahfouz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1497) Re-index seeded modified documents when the re-crawl interval is infinity and connector model is MODEL_ADD_CHANGE
Date Mon, 26 Feb 2018 17:46:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377251#comment-16377251
] 

Ahmed Mahfouz commented on CONNECTORS-1497:
-------------------------------------------

[~kwright@metacarta.com] Thanks for your prompt response. I am currently using the CMIS repository
connector ( I did some changes to it I might push them in a different ticket for you to review) 
and ElasticSearch output connector. I did that first but the problem is the status of the
documents after being indexed are PENDINGPURGATORY and for this status, you don't check or
update the execution time you just break in the switch statement so I had to change the status
to be able to modify the execution time.

> Re-index seeded modified documents when the re-crawl interval is infinity and   connector
model is MODEL_ADD_CHANGE
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1497
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1497
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 2.9.1
>            Reporter: Ahmed Mahfouz
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: CONNECTORS-1497.patch
>
>
> Trying to avoid a full scan of all documents for a better efficiency with a large number
of documents. I tried so many different setting for the Jobs but I couldn't accomplish that.
Especially when the repository connector model is MODEL_ADD_CHANGE I was expecting the modified
documents seeded should be re-indexed immediately similar to the new seeds but I found out
it uses the re-crawl time as the scheduled time and it waits for the full scan to get re-indexed.
I avoided full scan by setting the re-crawl interval to infinity but still, my modified documents
seeds were not getting indexed. After digging into the code for quite good time. I did some
modification to the JobManager and it worked for me. I would like to share the change with
you for review so I opened this ticket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message