manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Potential Issue with pausing jobs
Date Thu, 17 Sep 2015 16:52:41 GMT
I was able to reproduce this; CONNECTORS-1242.

Karl


On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <daddywri@gmail.com> wrote:

> I'm interested in the time it is supposed to be processed, actually.
>
> I'm trying to recreate your example here to see if I can get more
> information.
>
> Karl
>
>
>
> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
>> The document is in a state of 'Processed' and the status is 'Ready for
>> processing'
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:daddywri@gmail.com]
>> Sent: 17 September 2015 5:28
>> To: dev
>> Subject: Re: Potential Issue with pausing jobs
>>
>> When it is in the state after the job has resumed, can you do a Document
>> Status report and tell me what that says for your document?
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
>> Niall.Colreavy@fmr.com.invalid> wrote:
>>
>> > Hi Karl,
>> >
>> > Thanks for that. I think the problem might be more fundamental. When I
>> > start my job and monitor the simple job history I can see the job doing
>> > things like:
>> >
>> > Run the seed query
>> > Run the data query
>> > Run the seed query
>> > Run the data query
>> >
>> > Etc.
>> >
>> > It continues to do this indefinitely from what I have observed. As soon
>> as
>> > I pause and resume the job, all I can see in the simple job history is:
>> >
>> > Run the seed query
>> > Run the seed query
>> > Run the seed query
>> >
>> > It's like it's never going to run the data query again?
>> >
>> > Kind Regards,
>> >
>> > Niall
>> >
>> > -----Original Message-----
>> > From: Karl Wright [mailto:daddywri@gmail.com]
>> > Sent: 17 September 2015 4:53
>> > To: dev
>> > Subject: Re: Potential Issue with pausing jobs
>> >
>> > Hi Niall,
>> >
>> > A continuous job reseeds on a schedule, which you set as part of the job
>> > setup.  For a continuous job, if the document has been crawled, it will
>> be
>> > recrawled again at a specific time in the future, and if at that time it
>> > hasn't changed, it will be scheduled for checking again even further
>> out,
>> > up to a certain limit (also settable within the job).
>> >
>> > You can look at the document's schedule, by the way, using the "Document
>> > Status" report, and it should be pretty clear from that what should
>> happen
>> > and when.
>> >
>> > When you abort the job and restart it, everything is reset, so the
>> document
>> > will be checked immediately at that point, and relatively frequently
>> for a
>> > while until the system figures out that the document isn't changing very
>> > rapidly.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
>> > Niall.Colreavy@fmr.com.invalid> wrote:
>> >
>> > > Hi Karl,
>> > >
>> > > You'll have to forgive me if my answer is a bit uncertain but I am
>> very
>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
>> > > connector, I am literally just selecting 1 for the id, 'myurl' for the
>> > url
>> > > and 'mydata' for the data. So there is only ever 1 document being
>> > processed.
>> > >
>> > > So to answer the questions:
>> > >
>> > > 1. There are 0 active documents on the queue.
>> > > 2. Single process
>> > > 3. Yes, this is a continuous crawl.
>> > >
>> > > Kind Regards,
>> > >
>> > > Niall
>> > >
>> > > -----Original Message-----
>> > > From: Karl Wright [mailto:daddywri@gmail.com]
>> > > Sent: 17 September 2015 4:27
>> > > To: dev
>> > > Subject: Re: Potential Issue with pausing jobs
>> > >
>> > > Hi Niall,
>> > >
>> > > Pausing and resuming a job should have no effects *other* than
>> > > reprioritization of the active documents on the queue, which if there
>> > are a
>> > > lot of them, may take some time.
>> > >
>> > > So let's ask some basic questions.  (1) How many active documents on
>> your
>> > > queue? (2) What kind of synchronization are you using?  Is this single
>> > > process, or multiprocess?  (3) Is this a continuous crawl?
>> > >
>> > > >>>>>>
>> > > And on a side note, what is the difference between pausing a job and
>> > > aborting a job?
>> > > <<<<<<
>> > >
>> > > I can't fully answer that unless I know the characteristics of your
>> job,
>> > > especially continuous crawl vs. crawl to completion.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
>> > > Niall.Colreavy@fmr.com.invalid> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I am experimenting with pausing a job. The job has a simple JDBC
>> > > > connection and a null output connection. I was experimenting with
>> > pausing
>> > > > the job and I notice that when I resume the job, and monitor it's
>> > > progress
>> > > > in the simple history report, the job never seems to run the data
>> query
>> > > any
>> > > > more. I can see that it runs the seed query but it doesn't progress
>> to
>> > > the
>> > > > data query. If I abort the job and restart it, it does seem to start
>> > > > running the data query again.
>> > > >
>> > > > Can anyone explain this behaviour? And on a side note, what is the
>> > > > difference between pausing a job and aborting a job?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Niall
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message