manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Schmedding" <schme...@informatik.uni-freiburg.de>
Subject Continuous crawling
Date Sat, 04 Jan 2014 12:56:41 GMT
Hello,

the parameters reseed interval and recrawl interval of a continuous
crawling job are not quite clear to me. The documentation tells that the
reseed interval is the time after which the seeds are checked again, and
the recrawl interval is the time after which a document is checked for
changes.

However, we observed that the recrawl interval for a document increases
after each check. On the other hand, the reseed interval seems to be set
up correctly in the database metadata about the seed documents. Yet the
web server does not receive requests at each time the interval elapses but
only after several intervals have elapsed.

We are using a web connector. The web server does not tell the client to
cache the documents. Any help would be appreciated.

Best regards,
Florian




Mime
View raw message