manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: Hop count problem
Date Tue, 13 Aug 2013 13:33:24 GMT

Thanks for looking at this issue, Karl.

Yes, the db tables may be corrupted as a result of a lot of debugging I 
did in May.

Anyway, I have used tcpdump in order to investigate the traffic further.

1. MCF tries to fetch http://www.ibsen.uio.no/
2. Server response is 302 (redirected to
http://www.ibsen.uio.no/forside.xhtml)
3. MCF tries to fetch http://www.ibsen.uio.no/forside.xhtml
4. Server response is 200

And that's all. Then we know that there is no server error involved.

Log from tcpdump:
http://folk.uio.no/erlendfg/manifoldcf/chatter.dmp

Erlend

On 8/13/13 3:16 PM, Karl Wright wrote:
> Hmm. This is not at all what I would have expected.
>
> If "skueskill" is directly referenced by a seed document, or (worse) is
> in the seed list, I cannot see *how* the document can possibly have this
> state.
>
> - the referencing document definitely has a parseable reference to the
> document in question, and in any case having it be a "seed" should make
> the hopcount be zero;
> - if the reference is being filtered, it would be filtered from
> everywhere, and the document should thus get removed from the queue at
> the end of the job, because it is unreachable.
> - even if the hopcount tables have gotten corrupted, the fact that the
> document is a first-level reference or a seed should overwrite the
> record for that document.
>
> So I am at a complete loss to explain this behavior.
>
> Let me look through the code and see if I can find any code path that
> could lead to this behavior.
> Karl
>
>
> On Tue, Aug 13, 2013 at 9:01 AM, Erlend GarĂ¥sen <e.f.garasen@usit.uio.no
> <mailto:e.f.garasen@usit.uio.no>> wrote:
>
>     On 8/13/13 2:47 PM, Karl Wright wrote:
>
>         Looks like you need to re-enable connector debugging before we
>         can see
>         anything.
>
>
>     Unfortunately, yes. A bording task which must be done.
>
>
>         Also, does the missing document (skuespill) appear in the Document
>         Status report after the crawl?  Can you include that here if it
>         does?
>         (I am betting it does not...)
>
>
>     I added 60 mins as a time offset value, but I'm not 100% sure
>     whether the given result from Document status was created by this
>     job run or is an old entry in the database:
>
>     Idenfifier: http://www.ibsen.uio.no/__skuespill.xhtml
>     <http://www.ibsen.uio.no/skuespill.xhtml>
>
>     Job: Ibsen
>     State: Out of scope
>     Statu: Hopcount exceeded
>
>     Scheduled: 01-01-1970 01:00:00.000
>     Scheduled action: Process
>     Retry count / limit: N/A
>
>     Erlend
>
>


Mime
View raw message