manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Potential Issue with pausing jobs
Date Thu, 17 Sep 2015 15:53:27 GMT
Hi Niall,

A continuous job reseeds on a schedule, which you set as part of the job
setup.  For a continuous job, if the document has been crawled, it will be
recrawled again at a specific time in the future, and if at that time it
hasn't changed, it will be scheduled for checking again even further out,
up to a certain limit (also settable within the job).

You can look at the document's schedule, by the way, using the "Document
Status" report, and it should be pretty clear from that what should happen
and when.

When you abort the job and restart it, everything is reset, so the document
will be checked immediately at that point, and relatively frequently for a
while until the system figures out that the document isn't changing very
rapidly.

Thanks,
Karl






On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> You'll have to forgive me if my answer is a bit uncertain but I am very
> new to MCF. Just to clarify, I have a very simple job. For the JDBC
> connector, I am literally just selecting 1 for the id, 'myurl' for the url
> and 'mydata' for the data. So there is only ever 1 document being processed.
>
> So to answer the questions:
>
> 1. There are 0 active documents on the queue.
> 2. Single process
> 3. Yes, this is a continuous crawl.
>
> Kind Regards,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 4:27
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> Hi Niall,
>
> Pausing and resuming a job should have no effects *other* than
> reprioritization of the active documents on the queue, which if there are a
> lot of them, may take some time.
>
> So let's ask some basic questions.  (1) How many active documents on your
> queue? (2) What kind of synchronization are you using?  Is this single
> process, or multiprocess?  (3) Is this a continuous crawl?
>
> >>>>>>
> And on a side note, what is the difference between pausing a job and
> aborting a job?
> <<<<<<
>
> I can't fully answer that unless I know the characteristics of your job,
> especially continuous crawl vs. crawl to completion.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi,
> >
> > I am experimenting with pausing a job. The job has a simple JDBC
> > connection and a null output connection. I was experimenting with pausing
> > the job and I notice that when I resume the job, and monitor it's
> progress
> > in the simple history report, the job never seems to run the data query
> any
> > more. I can see that it runs the seed query but it doesn't progress to
> the
> > data query. If I abort the job and restart it, it does seem to start
> > running the data query again.
> >
> > Can anyone explain this behaviour? And on a side note, what is the
> > difference between pausing a job and aborting a job?
> >
> > Thanks,
> >
> > Niall
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message