manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Hop count problem
Date Mon, 12 Aug 2013 10:39:09 GMT

I have discovered an odd thing regarding hop counts. Our prod 
environment crawls a lot fewer documents compared to our test 
environment even though the configuration is exactly the same. Then I 
figured out that several documents which are expected to be fetched are, 
according to MCF, outside the hop count limit, but they're not.

This can be reproduced by using a small job for one particular host, 
www.ibsen.uio.no. The seed list is as follows:

http://www.ibsen.uio.no/

Hop filter settings are:
link: 6
redirect: 3

Only these two documents are fetched:
http://www.ibsen.uio.no/forside.xhtml
http://www.ibsen.uio.no/

Here's what MCF says about one omitted document, i.e., 
http://www.ibsen.uio.no/skuespill.xhtml:
State: out of scope
Status: Hopcount exceeded

This is odd. If you open up www.ibsen.uio.no, you can see that the link 
"http://www.ibsen.uio.no/skuespill.xhtml" (Skuespill) appears on the 
main page.

Our test environment fetches this document without problems.

Erlend

Mime
View raw message