manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Scheduled ManifoldCF jobs
Date Fri, 01 Apr 2016 12:30:38 GMT
Sorry, that response was *almost* incoherent. :-)

Trying again:

As far as how MCF computes incremental changes, it does not matter whether
a job is run on schedule, or manually.  But if you change certain aspects
of the job, namely the document specification information, MCF "starts
over" at the beginning of time.  It needs to do that because you might well
have made changes to the document specification that could change the way
documents are indexed.

Thanks,
Karl


On Fri, Apr 1, 2016 at 6:36 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Radko,
>
> For computing how MCF does job crawling, it does not care whether the job
> is run manually or by schedule.
>
> The issue is likely to be that you changed some other detail about the job
> definition that might have affected how documents are indexed.  In that
> case, MCF would cause all documents to be recrawled because of that.
> Changes to a job's document specification information will cause that to be
> the case.
>
> Thanks,
> Karl
>
>
> On Fri, Apr 1, 2016 at 3:40 AM, Najman, Radko <radko.najman@merck.com>
> wrote:
>
>> Hello,
>>
>> I have a few jobs crawling documents from Documentum. Some of these jobs
>> are quite big and the first run of the job takes a few hours or a day to
>> finish. Then, when I do a “minimal run” for updates, the job is usually
>> done in a few minutes.
>>
>> I want to schedule these jobs for daily runs. I’m experiencing that the
>> first scheduled run takes the same time as I ran the job for the first time
>> manually. It seems it is recrawling all documents. Next scheduled runs are
>> fast, a few minutes. Is it expected behaviour? I would expect the first
>> scheduled run to be fast too because the job was already finished before by
>> manual start. Is there a way how to don’t recrawl all documents in this
>> case, it’s really time consuming operation.
>>
>> My settings:
>> Schedule type: Scan every document once
>> Job invocation: Minimal
>> Scheduled time: once a day
>> Start method: Start when schedule window starts
>>
>> Thank you,
>> Radko
>>
>> Notice:  This e-mail message, together with any attachments, contains
>> information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
>> New Jersey, USA 07033), and/or its affiliates Direct contact information
>> for affiliates is available at
>> http://www.merck.com/contact/contacts.html) that may be confidential,
>> proprietary copyrighted and/or legally privileged. It is intended solely
>> for the use of the individual or entity named on this message. If you are
>> not the intended recipient, and have received this message in error,
>> please notify us immediately by reply e-mail and then delete it from
>> your system.
>>
>
>

Mime
View raw message