manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Schmedding (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-850) Maximum interval in dynamic crawling
Date Thu, 13 Feb 2014 10:28:19 GMT


Florian Schmedding commented on CONNECTORS-850:

What contributes to a document change - anything besides the content, e.g., HTTP header fields?
The content was only changed at the time indicated by the "***" note. The document is served
by an Apache http server on localhost. I used a modified webcrawler connector that recognizes
links in a custom xml format (it parses the xml and extracts the links and a document id,
nothing else).

> Maximum interval in dynamic crawling
> ------------------------------------
>                 Key: CONNECTORS-850
>                 URL:
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.4.1
>            Reporter: Florian Schmedding
>            Assignee: Karl Wright
>            Priority: Minor
>              Labels: features
>             Fix For: ManifoldCF 1.5
> Currently, the dynamic crawling method used for a continuous job extends the reseed and
recrawl intervals when no changes are found in a checked document. However, it should be possible
to restrict this extension to a maximum value in order to make sure that new documents are
discovered within a certain interval.

This message was sent by Atlassian JIRA

View raw message