manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Potential Issue with pausing jobs
Date Thu, 17 Sep 2015 20:34:05 GMT
I've attached a patch to the CONNECTORS-1242 ticket.

Karl


On Thu, Sep 17, 2015 at 12:52 PM, Karl Wright <daddywri@gmail.com> wrote:

> I was able to reproduce this; CONNECTORS-1242.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> I'm interested in the time it is supposed to be processed, actually.
>>
>> I'm trying to recreate your example here to see if I can get more
>> information.
>>
>> Karl
>>
>>
>>
>> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
>> Niall.Colreavy@fmr.com.invalid> wrote:
>>
>>> The document is in a state of 'Processed' and the status is 'Ready for
>>> processing'
>>>
>>> -----Original Message-----
>>> From: Karl Wright [mailto:daddywri@gmail.com]
>>> Sent: 17 September 2015 5:28
>>> To: dev
>>> Subject: Re: Potential Issue with pausing jobs
>>>
>>> When it is in the state after the job has resumed, can you do a Document
>>> Status report and tell me what that says for your document?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
>>> Niall.Colreavy@fmr.com.invalid> wrote:
>>>
>>> > Hi Karl,
>>> >
>>> > Thanks for that. I think the problem might be more fundamental. When I
>>> > start my job and monitor the simple job history I can see the job doing
>>> > things like:
>>> >
>>> > Run the seed query
>>> > Run the data query
>>> > Run the seed query
>>> > Run the data query
>>> >
>>> > Etc.
>>> >
>>> > It continues to do this indefinitely from what I have observed. As
>>> soon as
>>> > I pause and resume the job, all I can see in the simple job history is:
>>> >
>>> > Run the seed query
>>> > Run the seed query
>>> > Run the seed query
>>> >
>>> > It's like it's never going to run the data query again?
>>> >
>>> > Kind Regards,
>>> >
>>> > Niall
>>> >
>>> > -----Original Message-----
>>> > From: Karl Wright [mailto:daddywri@gmail.com]
>>> > Sent: 17 September 2015 4:53
>>> > To: dev
>>> > Subject: Re: Potential Issue with pausing jobs
>>> >
>>> > Hi Niall,
>>> >
>>> > A continuous job reseeds on a schedule, which you set as part of the
>>> job
>>> > setup.  For a continuous job, if the document has been crawled, it
>>> will be
>>> > recrawled again at a specific time in the future, and if at that time
>>> it
>>> > hasn't changed, it will be scheduled for checking again even further
>>> out,
>>> > up to a certain limit (also settable within the job).
>>> >
>>> > You can look at the document's schedule, by the way, using the
>>> "Document
>>> > Status" report, and it should be pretty clear from that what should
>>> happen
>>> > and when.
>>> >
>>> > When you abort the job and restart it, everything is reset, so the
>>> document
>>> > will be checked immediately at that point, and relatively frequently
>>> for a
>>> > while until the system figures out that the document isn't changing
>>> very
>>> > rapidly.
>>> >
>>> > Thanks,
>>> > Karl
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
>>> > Niall.Colreavy@fmr.com.invalid> wrote:
>>> >
>>> > > Hi Karl,
>>> > >
>>> > > You'll have to forgive me if my answer is a bit uncertain but I am
>>> very
>>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
>>> > > connector, I am literally just selecting 1 for the id, 'myurl' for
>>> the
>>> > url
>>> > > and 'mydata' for the data. So there is only ever 1 document being
>>> > processed.
>>> > >
>>> > > So to answer the questions:
>>> > >
>>> > > 1. There are 0 active documents on the queue.
>>> > > 2. Single process
>>> > > 3. Yes, this is a continuous crawl.
>>> > >
>>> > > Kind Regards,
>>> > >
>>> > > Niall
>>> > >
>>> > > -----Original Message-----
>>> > > From: Karl Wright [mailto:daddywri@gmail.com]
>>> > > Sent: 17 September 2015 4:27
>>> > > To: dev
>>> > > Subject: Re: Potential Issue with pausing jobs
>>> > >
>>> > > Hi Niall,
>>> > >
>>> > > Pausing and resuming a job should have no effects *other* than
>>> > > reprioritization of the active documents on the queue, which if there
>>> > are a
>>> > > lot of them, may take some time.
>>> > >
>>> > > So let's ask some basic questions.  (1) How many active documents on
>>> your
>>> > > queue? (2) What kind of synchronization are you using?  Is this
>>> single
>>> > > process, or multiprocess?  (3) Is this a continuous crawl?
>>> > >
>>> > > >>>>>>
>>> > > And on a side note, what is the difference between pausing a job and
>>> > > aborting a job?
>>> > > <<<<<<
>>> > >
>>> > > I can't fully answer that unless I know the characteristics of your
>>> job,
>>> > > especially continuous crawl vs. crawl to completion.
>>> > >
>>> > > Karl
>>> > >
>>> > >
>>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
>>> > > Niall.Colreavy@fmr.com.invalid> wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > I am experimenting with pausing a job. The job has a simple JDBC
>>> > > > connection and a null output connection. I was experimenting with
>>> > pausing
>>> > > > the job and I notice that when I resume the job, and monitor it's
>>> > > progress
>>> > > > in the simple history report, the job never seems to run the data
>>> query
>>> > > any
>>> > > > more. I can see that it runs the seed query but it doesn't
>>> progress to
>>> > > the
>>> > > > data query. If I abort the job and restart it, it does seem to
>>> start
>>> > > > running the data query again.
>>> > > >
>>> > > > Can anyone explain this behaviour? And on a side note, what is
the
>>> > > > difference between pausing a job and aborting a job?
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Niall
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message