manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colreavy, Niall" <Niall.Colre...@fmr.com.INVALID>
Subject RE: Potential Issue with pausing jobs
Date Fri, 18 Sep 2015 09:31:39 GMT
Hi Karl,

Thanks for looking into that. In the interim, we are going to abort, rather than pause the
job to circumvent the issue. Just out of curiosity, what is the difference between aborting
the job and pausing the job? We would just be a little bit concerned that there would be adverse
effects from regularly aborting the job.

Thanks,

Niall

-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com] 
Sent: 17 September 2015 5:53
To: dev
Subject: Re: Potential Issue with pausing jobs

I was able to reproduce this; CONNECTORS-1242.

Karl


On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <daddywri@gmail.com> wrote:

> I'm interested in the time it is supposed to be processed, actually.
>
> I'm trying to recreate your example here to see if I can get more
> information.
>
> Karl
>
>
>
> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
>> The document is in a state of 'Processed' and the status is 'Ready for
>> processing'
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:daddywri@gmail.com]
>> Sent: 17 September 2015 5:28
>> To: dev
>> Subject: Re: Potential Issue with pausing jobs
>>
>> When it is in the state after the job has resumed, can you do a Document
>> Status report and tell me what that says for your document?
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
>> Niall.Colreavy@fmr.com.invalid> wrote:
>>
>> > Hi Karl,
>> >
>> > Thanks for that. I think the problem might be more fundamental. When I
>> > start my job and monitor the simple job history I can see the job doing
>> > things like:
>> >
>> > Run the seed query
>> > Run the data query
>> > Run the seed query
>> > Run the data query
>> >
>> > Etc.
>> >
>> > It continues to do this indefinitely from what I have observed. As soon
>> as
>> > I pause and resume the job, all I can see in the simple job history is:
>> >
>> > Run the seed query
>> > Run the seed query
>> > Run the seed query
>> >
>> > It's like it's never going to run the data query again?
>> >
>> > Kind Regards,
>> >
>> > Niall
>> >
>> > -----Original Message-----
>> > From: Karl Wright [mailto:daddywri@gmail.com]
>> > Sent: 17 September 2015 4:53
>> > To: dev
>> > Subject: Re: Potential Issue with pausing jobs
>> >
>> > Hi Niall,
>> >
>> > A continuous job reseeds on a schedule, which you set as part of the job
>> > setup.  For a continuous job, if the document has been crawled, it will
>> be
>> > recrawled again at a specific time in the future, and if at that time it
>> > hasn't changed, it will be scheduled for checking again even further
>> out,
>> > up to a certain limit (also settable within the job).
>> >
>> > You can look at the document's schedule, by the way, using the "Document
>> > Status" report, and it should be pretty clear from that what should
>> happen
>> > and when.
>> >
>> > When you abort the job and restart it, everything is reset, so the
>> document
>> > will be checked immediately at that point, and relatively frequently
>> for a
>> > while until the system figures out that the document isn't changing very
>> > rapidly.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
>> > Niall.Colreavy@fmr.com.invalid> wrote:
>> >
>> > > Hi Karl,
>> > >
>> > > You'll have to forgive me if my answer is a bit uncertain but I am
>> very
>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
>> > > connector, I am literally just selecting 1 for the id, 'myurl' for the
>> > url
>> > > and 'mydata' for the data. So there is only ever 1 document being
>> > processed.
>> > >
>> > > So to answer the questions:
>> > >
>> > > 1. There are 0 active documents on the queue.
>> > > 2. Single process
>> > > 3. Yes, this is a continuous crawl.
>> > >
>> > > Kind Regards,
>> > >
>> > > Niall
>> > >
>> > > -----Original Message-----
>> > > From: Karl Wright [mailto:daddywri@gmail.com]
>> > > Sent: 17 September 2015 4:27
>> > > To: dev
>> > > Subject: Re: Potential Issue with pausing jobs
>> > >
>> > > Hi Niall,
>> > >
>> > > Pausing and resuming a job should have no effects *other* than
>> > > reprioritization of the active documents on the queue, which if there
>> > are a
>> > > lot of them, may take some time.
>> > >
>> > > So let's ask some basic questions.  (1) How many active documents on
>> your
>> > > queue? (2) What kind of synchronization are you using?  Is this single
>> > > process, or multiprocess?  (3) Is this a continuous crawl?
>> > >
>> > > >>>>>>
>> > > And on a side note, what is the difference between pausing a job and
>> > > aborting a job?
>> > > <<<<<<
>> > >
>> > > I can't fully answer that unless I know the characteristics of your
>> job,
>> > > especially continuous crawl vs. crawl to completion.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
>> > > Niall.Colreavy@fmr.com.invalid> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I am experimenting with pausing a job. The job has a simple JDBC
>> > > > connection and a null output connection. I was experimenting with
>> > pausing
>> > > > the job and I notice that when I resume the job, and monitor it's
>> > > progress
>> > > > in the simple history report, the job never seems to run the data
>> query
>> > > any
>> > > > more. I can see that it runs the seed query but it doesn't progress
>> to
>> > > the
>> > > > data query. If I abort the job and restart it, it does seem to start
>> > > > running the data query again.
>> > > >
>> > > > Can anyone explain this behaviour? And on a side note, what is the
>> > > > difference between pausing a job and aborting a job?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Niall
>> > > >
>> > >
>> >
>>
>
>
Mime
View raw message