manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Potential Issue with pausing jobs
Date Thu, 17 Sep 2015 16:28:23 GMT
When it is in the state after the job has resumed, can you do a Document
Status report and tell me what that says for your document?

Thanks,
Karl


On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> Thanks for that. I think the problem might be more fundamental. When I
> start my job and monitor the simple job history I can see the job doing
> things like:
>
> Run the seed query
> Run the data query
> Run the seed query
> Run the data query
>
> Etc.
>
> It continues to do this indefinitely from what I have observed. As soon as
> I pause and resume the job, all I can see in the simple job history is:
>
> Run the seed query
> Run the seed query
> Run the seed query
>
> It's like it's never going to run the data query again?
>
> Kind Regards,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 4:53
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> Hi Niall,
>
> A continuous job reseeds on a schedule, which you set as part of the job
> setup.  For a continuous job, if the document has been crawled, it will be
> recrawled again at a specific time in the future, and if at that time it
> hasn't changed, it will be scheduled for checking again even further out,
> up to a certain limit (also settable within the job).
>
> You can look at the document's schedule, by the way, using the "Document
> Status" report, and it should be pretty clear from that what should happen
> and when.
>
> When you abort the job and restart it, everything is reset, so the document
> will be checked immediately at that point, and relatively frequently for a
> while until the system figures out that the document isn't changing very
> rapidly.
>
> Thanks,
> Karl
>
>
>
>
>
>
> On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi Karl,
> >
> > You'll have to forgive me if my answer is a bit uncertain but I am very
> > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> > connector, I am literally just selecting 1 for the id, 'myurl' for the
> url
> > and 'mydata' for the data. So there is only ever 1 document being
> processed.
> >
> > So to answer the questions:
> >
> > 1. There are 0 active documents on the queue.
> > 2. Single process
> > 3. Yes, this is a continuous crawl.
> >
> > Kind Regards,
> >
> > Niall
> >
> > -----Original Message-----
> > From: Karl Wright [mailto:daddywri@gmail.com]
> > Sent: 17 September 2015 4:27
> > To: dev
> > Subject: Re: Potential Issue with pausing jobs
> >
> > Hi Niall,
> >
> > Pausing and resuming a job should have no effects *other* than
> > reprioritization of the active documents on the queue, which if there
> are a
> > lot of them, may take some time.
> >
> > So let's ask some basic questions.  (1) How many active documents on your
> > queue? (2) What kind of synchronization are you using?  Is this single
> > process, or multiprocess?  (3) Is this a continuous crawl?
> >
> > >>>>>>
> > And on a side note, what is the difference between pausing a job and
> > aborting a job?
> > <<<<<<
> >
> > I can't fully answer that unless I know the characteristics of your job,
> > especially continuous crawl vs. crawl to completion.
> >
> > Karl
> >
> >
> > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > I am experimenting with pausing a job. The job has a simple JDBC
> > > connection and a null output connection. I was experimenting with
> pausing
> > > the job and I notice that when I resume the job, and monitor it's
> > progress
> > > in the simple history report, the job never seems to run the data query
> > any
> > > more. I can see that it runs the seed query but it doesn't progress to
> > the
> > > data query. If I abort the job and restart it, it does seem to start
> > > running the data query again.
> > >
> > > Can anyone explain this behaviour? And on a side note, what is the
> > > difference between pausing a job and aborting a job?
> > >
> > > Thanks,
> > >
> > > Niall
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message