manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend Garåsen <e.f.gara...@usit.uio.no>
Subject Re: Hop count problem
Date Tue, 13 Aug 2013 09:27:11 GMT

OK, I have now changed the log level from INFO to DEBUG for connectors 
as well. Here's the log:
http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log

The following entry indicates that one of the missing URLs is 
found/extracted from a link:
DEBUG 2013-08-13 10:58:48,630 (Worker thread '9') - WEB: In html 
document 'http://www.ibsen.uio.no/forside.xhtml', found link to 
'http://www.ibsen.uio.no/skuespill.xhtml'

Then the job just ends and all the extracted links were never fetched.

Erlend

On 8/12/13 5:15 PM, Erlend Garåsen wrote:
>
> Thanks, I will tomorrow and report thereafter. I hope we will find a
> simple explanation. :)
>
> E
>
> On 8/12/13 5:07 PM, Karl Wright wrote:
>> Hi Erlend,
>>
>> You have wire logging (httpclient) enabled, which is useful for
>> debugging fetch issues, but you do not have connector debugging on.  To
>> turn it on, add this to properties.xml:
>>
>> <property name="org.apache.manifoldcf.connectors" value="DEBUG"/>
>>
>> thanks,
>> Karl
>>
>>
>> On Mon, Aug 12, 2013 at 10:53 AM, Erlend Garåsen
>> <e.f.garasen@usit.uio.no <mailto:e.f.garasen@usit.uio.no>> wrote:
>>
>>     On 8/12/13 4:29 PM, Karl Wright wrote:
>>
>>         Hi Erlend,
>>
>>         The Document Status report shows these documents because they
>>         are still
>>         in the queue.  The reasons for this could be several.  Documents
>>         that
>>         exceed the hopcount by 1 level are allowed to remain in the
>>         queue for
>>         bookkeeping purposes.  "scheduled date" as given only meaningful
>>         if the
>>         document is in an active state; my guess is that these documents
>>         are not
>>         in fact in that state, but rather in the state
>>         HOPCOUNT_EXCEEDED.  Can
>>         you include one complete row from the Document Status report for
>>         one of
>>         the missing documents?
>>
>>
>>     For "http://www.ibsen.uio.no/__sakprosa.xhtml
>>     <http://www.ibsen.uio.no/sakprosa.xhtml>":
>>     Job: Ibsen
>>
>>     State: Out of scope
>>     Status: Hopcount exceeded
>>     Scheduled: 01-01-1970 01:00:00.000
>>     Scheduled action: Process
>>     Retry count: N/A
>>     Retry limit: N/A
>>
>>
>>         When you added documents to the seed list, what did the Simple
>>         History
>>         say when they were fetched?  If they don't appear in the simple
>>         history,
>>         they SHOULD have nevertheless appeared in the log, with an
>>         explanation
>>         of why they were excluded, provided you have connector debugging
>>         enabled.
>>
>>
>>     OK, here is the seed list:
>>     http://www.ibsen.uio.no/
>>
>>     http://www.ibsen.uio.no/__skuespill.xhtml
>>     <http://www.ibsen.uio.no/skuespill.xhtml>
>>     http://www.ibsen.uio.no/dikt.__xhtml
>>     <http://www.ibsen.uio.no/dikt.xhtml>
>>     http://www.ibsen.uio.no/brev.__xhtml
>>     <http://www.ibsen.uio.no/brev.xhtml>
>>     http://www.ibsen.uio.no/__sakprosa.xhtml
>>     <http://www.ibsen.uio.no/sakprosa.xhtml>
>>     http://www.ibsen.uio.no/varia.__xhtml
>>     <http://www.ibsen.uio.no/varia.xhtml>
>>     http://www.ibsen.uio.no/__undervisningsressurser.xhtml
>>     <http://www.ibsen.uio.no/undervisningsressurser.xhtml>
>>
>>     Here is the results from simple history:
>>     08-12-2013 16:46:26.536         job end         1368534065016(Ibsen)
>>                      0       1
>>     08-12-2013 16:46:09.927         document ingest (Solr)
>>     http://www.ibsen.uio.no/__forside.xhtml
>>     <http://www.ibsen.uio.no/forside.xhtml>
>>              OK      11897   178
>>     08-12-2013 16:46:09.751         fetch
>>     http://www.ibsen.uio.no/__forside.xhtml
>>     <http://www.ibsen.uio.no/forside.xhtml>
>>              200     11897   17
>>     08-12-2013 16:44:48.829         fetch http://www.ibsen.uio.no/
>>              302     0       79484
>>     08-12-2013 16:44:48.727         robots parse www.ibsen.uio.no:80
>>     <http://www.ibsen.uio.no:80>
>>
>>              HTML    0       2       Robots file contained HTML, skipped
>>     08-12-2013 16:44:46.574         job start       1368534065016(Ibsen)
>>                      0       1
>>              1
>>
>>     HttpClient log:
>>     http://folk.uio.no/erlendfg/__manifoldcf/manifoldcf.log
>>     <http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log>
>>
>>     Erlend
>>
>>
>


Mime
View raw message