manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <>
Subject Re: Crawling just one particular page from a host
Date Tue, 14 May 2013 12:06:19 GMT
On 14.05.13 13.49, Karl Wright wrote:
> Hi Erlend,
> "Hosts matching seeds" means that if the domain (in this case
> <>) is mentioned in a seed, a
> page with the same domain will be included in the crawl if there is
> nothing else that excludes it.  So it sounds like it is working as designed.

Yes, you are right. I'm just trying to find a simple way to crawl just 
the starting page of a host and nothing else, i.e.:

I tried to place this in the include in crawl box:

Still it will include everything else from that host unless I write a 
lot of exclude reg exp rules.


Erlend GarĂ¥sen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

View raw message