manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shigeki Kobayashi <shigeki.kobayas...@g.softbank.co.jp>
Subject how web crawler crawls contents after the first crawling
Date Fri, 06 Feb 2015 07:26:03 GMT
Hi Karl


I have a basic question about how web crawler crawls contents after
the first crawling.

Does it crawls and indexes all pages from the root all the time or it
crawls
only pages that are modified.

If it crawls only modified pages how does it figure out the pages are
modified?
By checking the size of the pages? by hash?

How about documents files like PDF, linked in web pages?
if those documents are modified, how does MCF figure out they are modified?

I am using old version, MCF 1.4.1

Best regards,


Shigeki

Mime
View raw message