manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Schmedding (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-850) Maximum interval in dynamic crawling
Date Fri, 14 Feb 2014 18:18:22 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901726#comment-13901726
] 

Florian Schmedding commented on CONNECTORS-850:
-----------------------------------------------

Would it be possible add header include and exclude lists to the configuration options of
a web repository? Some web servers even update the last-modified date on each access although
nothing changed. It depends on the content and the server which header fields should be considered
when checking for changes.

> Maximum interval in dynamic crawling
> ------------------------------------
>
>                 Key: CONNECTORS-850
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-850
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.4.1
>            Reporter: Florian Schmedding
>            Assignee: Karl Wright
>            Priority: Minor
>              Labels: features
>             Fix For: ManifoldCF 1.5
>
>
> Currently, the dynamic crawling method used for a continuous job extends the reseed and
recrawl intervals when no changes are found in a checked document. However, it should be possible
to restrict this extension to a maximum value in order to make sure that new documents are
discovered within a certain interval.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message