Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1321F17A1D for ; Thu, 17 Sep 2015 20:34:12 +0000 (UTC) Received: (qmail 58384 invoked by uid 500); 17 Sep 2015 20:34:09 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 58335 invoked by uid 500); 17 Sep 2015 20:34:09 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 58323 invoked by uid 99); 17 Sep 2015 20:34:08 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Sep 2015 20:34:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3FE0818099A for ; Thu, 17 Sep 2015 20:34:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.898 X-Spam-Level: ** X-Spam-Status: No, score=2.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id vJkVfQahyz8R for ; Thu, 17 Sep 2015 20:34:06 +0000 (UTC) Received: from mail-io0-f179.google.com (mail-io0-f179.google.com [209.85.223.179]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D187C42F38 for ; Thu, 17 Sep 2015 20:34:05 +0000 (UTC) Received: by ioiz6 with SMTP id z6so35358211ioi.2 for ; Thu, 17 Sep 2015 13:34:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JGnhTCOY23af4RCG3/dOkE5G5WARV/vg1Px/piv+bAE=; b=QimQRVBmlPNpdCBIdJm4MrJfioq+jlOFmFFU8WKSTHbclDazA8QBGoi14SLayTU15+ boBN4PXcpJo6j6xESnca6Og+5/BXUlOhULH+7pW7A3AKsONdWskPdF5ZI0ZNupYZs/79 VcXsFEmfwyec4gpNWz2Qi1tmKLYKzjEIOP9DMNIDEni9phmy9c6URs+qIfEZ6tj/UWVq aB9SHDKCMJkwBVzMLqDmMsiTwSFydwf5+uK/pxcM9N0D2EBPBRJgwOpqd26bvz37km0t IJlemu5Tg09spEGvUteLVxSGWrOljB7f0ySg7frsuwz4a4q/hSZxOilDjgabLoHa4W+/ flAw== MIME-Version: 1.0 X-Received: by 10.107.28.5 with SMTP id c5mr3080138ioc.150.1442522045417; Thu, 17 Sep 2015 13:34:05 -0700 (PDT) Received: by 10.107.181.19 with HTTP; Thu, 17 Sep 2015 13:34:05 -0700 (PDT) In-Reply-To: References: <0fb21a6657204aa3a9b50f5c2cfa7889@MSGOMA2DAG01X.DMN1.FMR.com> <86f9f792bc884b53835edc404b7cc408@MSGOMA2DAG01X.DMN1.FMR.com> <6d1fe7392f824668a587e348558ab95f@MSGOMA2DAG01X.DMN1.FMR.com> Date: Thu, 17 Sep 2015 16:34:05 -0400 Message-ID: Subject: Re: Potential Issue with pausing jobs From: Karl Wright To: dev Content-Type: multipart/alternative; boundary=001a113ff2261911d3051ff75695 --001a113ff2261911d3051ff75695 Content-Type: text/plain; charset=UTF-8 I've attached a patch to the CONNECTORS-1242 ticket. Karl On Thu, Sep 17, 2015 at 12:52 PM, Karl Wright wrote: > I was able to reproduce this; CONNECTORS-1242. > > Karl > > > On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright wrote: > >> I'm interested in the time it is supposed to be processed, actually. >> >> I'm trying to recreate your example here to see if I can get more >> information. >> >> Karl >> >> >> >> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall < >> Niall.Colreavy@fmr.com.invalid> wrote: >> >>> The document is in a state of 'Processed' and the status is 'Ready for >>> processing' >>> >>> -----Original Message----- >>> From: Karl Wright [mailto:daddywri@gmail.com] >>> Sent: 17 September 2015 5:28 >>> To: dev >>> Subject: Re: Potential Issue with pausing jobs >>> >>> When it is in the state after the job has resumed, can you do a Document >>> Status report and tell me what that says for your document? >>> >>> Thanks, >>> Karl >>> >>> >>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall < >>> Niall.Colreavy@fmr.com.invalid> wrote: >>> >>> > Hi Karl, >>> > >>> > Thanks for that. I think the problem might be more fundamental. When I >>> > start my job and monitor the simple job history I can see the job doing >>> > things like: >>> > >>> > Run the seed query >>> > Run the data query >>> > Run the seed query >>> > Run the data query >>> > >>> > Etc. >>> > >>> > It continues to do this indefinitely from what I have observed. As >>> soon as >>> > I pause and resume the job, all I can see in the simple job history is: >>> > >>> > Run the seed query >>> > Run the seed query >>> > Run the seed query >>> > >>> > It's like it's never going to run the data query again? >>> > >>> > Kind Regards, >>> > >>> > Niall >>> > >>> > -----Original Message----- >>> > From: Karl Wright [mailto:daddywri@gmail.com] >>> > Sent: 17 September 2015 4:53 >>> > To: dev >>> > Subject: Re: Potential Issue with pausing jobs >>> > >>> > Hi Niall, >>> > >>> > A continuous job reseeds on a schedule, which you set as part of the >>> job >>> > setup. For a continuous job, if the document has been crawled, it >>> will be >>> > recrawled again at a specific time in the future, and if at that time >>> it >>> > hasn't changed, it will be scheduled for checking again even further >>> out, >>> > up to a certain limit (also settable within the job). >>> > >>> > You can look at the document's schedule, by the way, using the >>> "Document >>> > Status" report, and it should be pretty clear from that what should >>> happen >>> > and when. >>> > >>> > When you abort the job and restart it, everything is reset, so the >>> document >>> > will be checked immediately at that point, and relatively frequently >>> for a >>> > while until the system figures out that the document isn't changing >>> very >>> > rapidly. >>> > >>> > Thanks, >>> > Karl >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall < >>> > Niall.Colreavy@fmr.com.invalid> wrote: >>> > >>> > > Hi Karl, >>> > > >>> > > You'll have to forgive me if my answer is a bit uncertain but I am >>> very >>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC >>> > > connector, I am literally just selecting 1 for the id, 'myurl' for >>> the >>> > url >>> > > and 'mydata' for the data. So there is only ever 1 document being >>> > processed. >>> > > >>> > > So to answer the questions: >>> > > >>> > > 1. There are 0 active documents on the queue. >>> > > 2. Single process >>> > > 3. Yes, this is a continuous crawl. >>> > > >>> > > Kind Regards, >>> > > >>> > > Niall >>> > > >>> > > -----Original Message----- >>> > > From: Karl Wright [mailto:daddywri@gmail.com] >>> > > Sent: 17 September 2015 4:27 >>> > > To: dev >>> > > Subject: Re: Potential Issue with pausing jobs >>> > > >>> > > Hi Niall, >>> > > >>> > > Pausing and resuming a job should have no effects *other* than >>> > > reprioritization of the active documents on the queue, which if there >>> > are a >>> > > lot of them, may take some time. >>> > > >>> > > So let's ask some basic questions. (1) How many active documents on >>> your >>> > > queue? (2) What kind of synchronization are you using? Is this >>> single >>> > > process, or multiprocess? (3) Is this a continuous crawl? >>> > > >>> > > >>>>>> >>> > > And on a side note, what is the difference between pausing a job and >>> > > aborting a job? >>> > > <<<<<< >>> > > >>> > > I can't fully answer that unless I know the characteristics of your >>> job, >>> > > especially continuous crawl vs. crawl to completion. >>> > > >>> > > Karl >>> > > >>> > > >>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall < >>> > > Niall.Colreavy@fmr.com.invalid> wrote: >>> > > >>> > > > Hi, >>> > > > >>> > > > I am experimenting with pausing a job. The job has a simple JDBC >>> > > > connection and a null output connection. I was experimenting with >>> > pausing >>> > > > the job and I notice that when I resume the job, and monitor it's >>> > > progress >>> > > > in the simple history report, the job never seems to run the data >>> query >>> > > any >>> > > > more. I can see that it runs the seed query but it doesn't >>> progress to >>> > > the >>> > > > data query. If I abort the job and restart it, it does seem to >>> start >>> > > > running the data query again. >>> > > > >>> > > > Can anyone explain this behaviour? And on a side note, what is the >>> > > > difference between pausing a job and aborting a job? >>> > > > >>> > > > Thanks, >>> > > > >>> > > > Niall >>> > > > >>> > > >>> > >>> >> >> > --001a113ff2261911d3051ff75695--