manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-880) Under the right conditions, job aborts do not update "last checked" time
Date Tue, 19 Mar 2019 06:00:00 GMT


Karl Wright commented on CONNECTORS-880:

[~SubasiniR], your issue has nothing whatsoever to do with this ticket.  It really belongs
first on the user list.

The issue is that your database is going offline for 2700 seconds while your crawl is taking
place, or almost 45 minutes.  Queries that normally would be instantaneous are therefore just
not being completed at all for that period of time.  The plans look fine so that isn't it.

If this is using HSQLDB (which is the default database for the single-process example), then
you probably have exceeded its capacity.  It stores all of its tables in memory.  You will
want to upgrade to a real database instead.  I would preter postgresql over mysql because
mysql has been having transactional integrity issues for a couple of versions now, and that
will be fatal to use with ManifoldCF.

By the way, "Illegal seed URL" is a warning and does not impact behavior other than to notify
you that one of the seeds you are using in your crawl is not valid according to the w3c spec.
 The seed will not be used.

> Under the right conditions, job aborts do not update "last checked" time
> ------------------------------------------------------------------------
>                 Key: CONNECTORS-880
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.4.1
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 1.6
> When a scheduled job is being considered to be started, MCF updates the last-check field
ONLY if the job didn't start.  It relies on the job's completion to set the last-check field
in the case where the job does start.  But if the job aborts, in at least one case the last-check
field is NOT updated.  This leads to the job being run over and over again within the schedule

This message was sent by Atlassian JIRA

View raw message