manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Scheduled ManifoldCF jobs
Date Fri, 08 Apr 2016 14:54:54 GMT
There's one slightly funky thing about the Documentum connector that tries
to compensate for clock skew as follows:

      // There seems to be some unexplained slop in the latest DCTM
version.  It misses documents depending on how close to the r_modify_date
you happen to be.
      // So, I've decreased the start time by a full five minutes, to
insure overlap.
      if (startTime > 300000L)
        startTime = startTime - 300000L;
        startTime = 0L;
      StringBuilder strDQLend = new StringBuilder(" where r_modify_date >=
" + buildDateString(startTime) +
        " and r_modify_date<=" + buildDateString(seedTime) +
        " AND (i_is_deleted=TRUE Or (i_is_deleted=FALSE AND
a_full_text=TRUE AND r_content_size>0");


The 300000 ms adjustment is five minutes, which doesn't seem like a lot but
maybe it is affecting your testing?


On Fri, Apr 8, 2016 at 10:50 AM, Karl Wright <> wrote:

> Hi Radko,
> There's no magic here; the seedingversion from the database is passed to
> the connector method which seeds documents.  The only way this version gets
> cleared is if you save the job and the document specification changes.
> The only other possibility I can think of is that the documentum connector
> is ignoring the seedingversion information.  I will look into this further
> over the weekend.
> Karl
> On Fri, Apr 8, 2016 at 10:33 AM, Najman, Radko <>
> wrote:
>> Hi Karl,
>> thanks for your clarification.
>> I’m not changing any document specification information. I just set
>> “Scheduled time” and “Job invocation” on “Scheduling” tab, “Start method”
>> on “Connection” tab and click “Save” button. That’s all.
>> I tried to set all the scheduling information directly in Postres
>> database to be sure I didn’t change any document specification
>> information and the result was the same, all documents were recrawled.
>> One more thing I tried was to update “seedingversion” in “jobs” table
>> but again all documents were recrawled.
>> Thanks,
>> Radko
>> From: Karl Wright <>
>> Reply-To: "" <>
>> Date: Friday 1 April 2016 at 14:30
>> To: "" <>
>> Subject: Re: Scheduled ManifoldCF jobs
>> Sorry, that response was *almost* incoherent. :-)
>> Trying again:
>> As far as how MCF computes incremental changes, it does not matter
>> whether a job is run on schedule, or manually.  But if you change certain
>> aspects of the job, namely the document specification information, MCF
>> "starts over" at the beginning of time.  It needs to do that because you
>> might well have made changes to the document specification that could
>> change the way documents are indexed.
>> Thanks,
>> Karl
>> On Fri, Apr 1, 2016 at 6:36 AM, Karl Wright <> wrote:
>>> Hi Radko,
>>> For computing how MCF does job crawling, it does not care whether the
>>> job is run manually or by schedule.
>>> The issue is likely to be that you changed some other detail about the
>>> job definition that might have affected how documents are indexed.  In that
>>> case, MCF would cause all documents to be recrawled because of that.
>>> Changes to a job's document specification information will cause that to be
>>> the case.
>>> Thanks,
>>> Karl
>>> On Fri, Apr 1, 2016 at 3:40 AM, Najman, Radko wrote:
>>>> Hello,
>>>> I have a few jobs crawling documents from Documentum. Some of these
>>>> jobs are quite big and the first run of the job takes a few hours or a day
>>>> to finish. Then, when I do a “minimal run” for updates, the job is usually
>>>> done in a few minutes.
>>>> I want to schedule these jobs for daily runs. I’m experiencing that the
>>>> first scheduled run takes the same time as I ran the job for the first time
>>>> manually. It seems it is recrawling all documents. Next scheduled runs are
>>>> fast, a few minutes. Is it expected behaviour? I would expect the first
>>>> scheduled run to be fast too because the job was already finished before
>>>> manual start. Is there a way how to don’t recrawl all documents in this
>>>> case, it’s really time consuming operation.
>>>> My settings:
>>>> Schedule type: Scan every document once
>>>> Job invocation: Minimal
>>>> Scheduled time: once a day
>>>> Start method: Start when schedule window starts
>>>> Thank you,
>>>> Radko
>>> Notice:  This e-mail message, together with any attachments, contains
>> information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
>> New Jersey, USA 07033), and/or its affiliates Direct contact information
>> for affiliates is available at
>> that may be confidential,
>> proprietary copyrighted and/or legally privileged. It is intended solely
>> for the use of the individual or entity named on this message. If you are
>> not the intended recipient, and have received this message in error,
>> please notify us immediately by reply e-mail and then delete it from
>> your system.

View raw message