manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CONNECTORS-1122) Explore ways to make job start be faster in systems with lots of documents
Date Mon, 15 Dec 2014 12:57:14 GMT

     [ https://issues.apache.org/jira/browse/CONNECTORS-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wright updated CONNECTORS-1122:
------------------------------------
    Attachment: CONNECTORS-1122.patch

This patch turns off reprioritization on job starts and job resumes.  You may want to play
with it and see if it is acceptable for MCF to work this way.


> Explore ways to make job start be faster in systems with lots of documents
> --------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1122
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1122
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.8, ManifoldCF 2.0
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.9, ManifoldCF 2.1
>
>         Attachments: CONNECTORS-1122.patch
>
>
> Job start requires all documents to be marked as needing reprioritization now.  We should
consider ways in which we can reduce the need to do this as much as possible.  For example,
if there are NO documents at all for a job, reprioritization is by definition unneeded.  Alternatively,
coming up with a way of determining if there are any bin-level overlaps between documents
made active by a job start at documents elsewhere, we could be more targeted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message