manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Potential Issue with pausing jobs
Date Fri, 18 Sep 2015 11:07:56 GMT
Aborting a job, or restarting it, is perfectly safe and will lose no data.
As I said before, the difference lies in the fact that pausing does not
disrupt the document fetching and seeding schedules, while aborting will
disrupt these, and make everything start over schedule-wise.

Karl


On Fri, Sep 18, 2015 at 5:31 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> Thanks for looking into that. In the interim, we are going to abort,
> rather than pause the job to circumvent the issue. Just out of curiosity,
> what is the difference between aborting the job and pausing the job? We
> would just be a little bit concerned that there would be adverse effects
> from regularly aborting the job.
>
> Thanks,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 5:53
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> I was able to reproduce this; CONNECTORS-1242.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>
> > I'm interested in the time it is supposed to be processed, actually.
> >
> > I'm trying to recreate your example here to see if I can get more
> > information.
> >
> > Karl
> >
> >
> >
> > On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> >> The document is in a state of 'Processed' and the status is 'Ready for
> >> processing'
> >>
> >> -----Original Message-----
> >> From: Karl Wright [mailto:daddywri@gmail.com]
> >> Sent: 17 September 2015 5:28
> >> To: dev
> >> Subject: Re: Potential Issue with pausing jobs
> >>
> >> When it is in the state after the job has resumed, can you do a Document
> >> Status report and tell me what that says for your document?
> >>
> >> Thanks,
> >> Karl
> >>
> >>
> >> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
> >> Niall.Colreavy@fmr.com.invalid> wrote:
> >>
> >> > Hi Karl,
> >> >
> >> > Thanks for that. I think the problem might be more fundamental. When I
> >> > start my job and monitor the simple job history I can see the job
> doing
> >> > things like:
> >> >
> >> > Run the seed query
> >> > Run the data query
> >> > Run the seed query
> >> > Run the data query
> >> >
> >> > Etc.
> >> >
> >> > It continues to do this indefinitely from what I have observed. As
> soon
> >> as
> >> > I pause and resume the job, all I can see in the simple job history
> is:
> >> >
> >> > Run the seed query
> >> > Run the seed query
> >> > Run the seed query
> >> >
> >> > It's like it's never going to run the data query again?
> >> >
> >> > Kind Regards,
> >> >
> >> > Niall
> >> >
> >> > -----Original Message-----
> >> > From: Karl Wright [mailto:daddywri@gmail.com]
> >> > Sent: 17 September 2015 4:53
> >> > To: dev
> >> > Subject: Re: Potential Issue with pausing jobs
> >> >
> >> > Hi Niall,
> >> >
> >> > A continuous job reseeds on a schedule, which you set as part of the
> job
> >> > setup.  For a continuous job, if the document has been crawled, it
> will
> >> be
> >> > recrawled again at a specific time in the future, and if at that time
> it
> >> > hasn't changed, it will be scheduled for checking again even further
> >> out,
> >> > up to a certain limit (also settable within the job).
> >> >
> >> > You can look at the document's schedule, by the way, using the
> "Document
> >> > Status" report, and it should be pretty clear from that what should
> >> happen
> >> > and when.
> >> >
> >> > When you abort the job and restart it, everything is reset, so the
> >> document
> >> > will be checked immediately at that point, and relatively frequently
> >> for a
> >> > while until the system figures out that the document isn't changing
> very
> >> > rapidly.
> >> >
> >> > Thanks,
> >> > Karl
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> >> > Niall.Colreavy@fmr.com.invalid> wrote:
> >> >
> >> > > Hi Karl,
> >> > >
> >> > > You'll have to forgive me if my answer is a bit uncertain but I am
> >> very
> >> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> >> > > connector, I am literally just selecting 1 for the id, 'myurl' for
> the
> >> > url
> >> > > and 'mydata' for the data. So there is only ever 1 document being
> >> > processed.
> >> > >
> >> > > So to answer the questions:
> >> > >
> >> > > 1. There are 0 active documents on the queue.
> >> > > 2. Single process
> >> > > 3. Yes, this is a continuous crawl.
> >> > >
> >> > > Kind Regards,
> >> > >
> >> > > Niall
> >> > >
> >> > > -----Original Message-----
> >> > > From: Karl Wright [mailto:daddywri@gmail.com]
> >> > > Sent: 17 September 2015 4:27
> >> > > To: dev
> >> > > Subject: Re: Potential Issue with pausing jobs
> >> > >
> >> > > Hi Niall,
> >> > >
> >> > > Pausing and resuming a job should have no effects *other* than
> >> > > reprioritization of the active documents on the queue, which if
> there
> >> > are a
> >> > > lot of them, may take some time.
> >> > >
> >> > > So let's ask some basic questions.  (1) How many active documents
on
> >> your
> >> > > queue? (2) What kind of synchronization are you using?  Is this
> single
> >> > > process, or multiprocess?  (3) Is this a continuous crawl?
> >> > >
> >> > > >>>>>>
> >> > > And on a side note, what is the difference between pausing a job and
> >> > > aborting a job?
> >> > > <<<<<<
> >> > >
> >> > > I can't fully answer that unless I know the characteristics of your
> >> job,
> >> > > especially continuous crawl vs. crawl to completion.
> >> > >
> >> > > Karl
> >> > >
> >> > >
> >> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> >> > > Niall.Colreavy@fmr.com.invalid> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I am experimenting with pausing a job. The job has a simple JDBC
> >> > > > connection and a null output connection. I was experimenting
with
> >> > pausing
> >> > > > the job and I notice that when I resume the job, and monitor
it's
> >> > > progress
> >> > > > in the simple history report, the job never seems to run the
data
> >> query
> >> > > any
> >> > > > more. I can see that it runs the seed query but it doesn't
> progress
> >> to
> >> > > the
> >> > > > data query. If I abort the job and restart it, it does seem to
> start
> >> > > > running the data query again.
> >> > > >
> >> > > > Can anyone explain this behaviour? And on a side note, what is
the
> >> > > > difference between pausing a job and aborting a job?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Niall
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message