manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Potential Issue with pausing jobs
Date Thu, 17 Sep 2015 16:45:37 GMT
I'm interested in the time it is supposed to be processed, actually.

I'm trying to recreate your example here to see if I can get more
information.

Karl



On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> The document is in a state of 'Processed' and the status is 'Ready for
> processing'
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 5:28
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> When it is in the state after the job has resumed, can you do a Document
> Status report and tell me what that says for your document?
>
> Thanks,
> Karl
>
>
> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi Karl,
> >
> > Thanks for that. I think the problem might be more fundamental. When I
> > start my job and monitor the simple job history I can see the job doing
> > things like:
> >
> > Run the seed query
> > Run the data query
> > Run the seed query
> > Run the data query
> >
> > Etc.
> >
> > It continues to do this indefinitely from what I have observed. As soon
> as
> > I pause and resume the job, all I can see in the simple job history is:
> >
> > Run the seed query
> > Run the seed query
> > Run the seed query
> >
> > It's like it's never going to run the data query again?
> >
> > Kind Regards,
> >
> > Niall
> >
> > -----Original Message-----
> > From: Karl Wright [mailto:daddywri@gmail.com]
> > Sent: 17 September 2015 4:53
> > To: dev
> > Subject: Re: Potential Issue with pausing jobs
> >
> > Hi Niall,
> >
> > A continuous job reseeds on a schedule, which you set as part of the job
> > setup.  For a continuous job, if the document has been crawled, it will
> be
> > recrawled again at a specific time in the future, and if at that time it
> > hasn't changed, it will be scheduled for checking again even further out,
> > up to a certain limit (also settable within the job).
> >
> > You can look at the document's schedule, by the way, using the "Document
> > Status" report, and it should be pretty clear from that what should
> happen
> > and when.
> >
> > When you abort the job and restart it, everything is reset, so the
> document
> > will be checked immediately at that point, and relatively frequently for
> a
> > while until the system figures out that the document isn't changing very
> > rapidly.
> >
> > Thanks,
> > Karl
> >
> >
> >
> >
> >
> >
> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> > > Hi Karl,
> > >
> > > You'll have to forgive me if my answer is a bit uncertain but I am very
> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> > > connector, I am literally just selecting 1 for the id, 'myurl' for the
> > url
> > > and 'mydata' for the data. So there is only ever 1 document being
> > processed.
> > >
> > > So to answer the questions:
> > >
> > > 1. There are 0 active documents on the queue.
> > > 2. Single process
> > > 3. Yes, this is a continuous crawl.
> > >
> > > Kind Regards,
> > >
> > > Niall
> > >
> > > -----Original Message-----
> > > From: Karl Wright [mailto:daddywri@gmail.com]
> > > Sent: 17 September 2015 4:27
> > > To: dev
> > > Subject: Re: Potential Issue with pausing jobs
> > >
> > > Hi Niall,
> > >
> > > Pausing and resuming a job should have no effects *other* than
> > > reprioritization of the active documents on the queue, which if there
> > are a
> > > lot of them, may take some time.
> > >
> > > So let's ask some basic questions.  (1) How many active documents on
> your
> > > queue? (2) What kind of synchronization are you using?  Is this single
> > > process, or multiprocess?  (3) Is this a continuous crawl?
> > >
> > > >>>>>>
> > > And on a side note, what is the difference between pausing a job and
> > > aborting a job?
> > > <<<<<<
> > >
> > > I can't fully answer that unless I know the characteristics of your
> job,
> > > especially continuous crawl vs. crawl to completion.
> > >
> > > Karl
> > >
> > >
> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> > > Niall.Colreavy@fmr.com.invalid> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am experimenting with pausing a job. The job has a simple JDBC
> > > > connection and a null output connection. I was experimenting with
> > pausing
> > > > the job and I notice that when I resume the job, and monitor it's
> > > progress
> > > > in the simple history report, the job never seems to run the data
> query
> > > any
> > > > more. I can see that it runs the seed query but it doesn't progress
> to
> > > the
> > > > data query. If I abort the job and restart it, it does seem to start
> > > > running the data query again.
> > > >
> > > > Can anyone explain this behaviour? And on a side note, what is the
> > > > difference between pausing a job and aborting a job?
> > > >
> > > > Thanks,
> > > >
> > > > Niall
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message