manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: The Schedulars are not starting automatically
Date Tue, 06 Nov 2012 18:16:37 GMT
Hi Anupam,

I'm having difficulty understanding what you posted here, but I will
try to explain the difference between "rescan dynamically" and "scan
every document once".  You may find more help also in ManifoldCF in
Action, at http://www.manning.com/wright .

The first option causes your job to run forever.  The job runs only in
the schedule windows allotted for it.  It periodically "discovers" new
documents, and (depending on the crawling model of the connector) may
check for existence or modification of an already-crawled document.
Each document has its own schedule for doing this.

The second option is more likely to be what you want.  Each job
starts, runs, and completes, being sure to run only in the scheduling
windows you provide.  You then run it again, and again (or your job
schedule makes that happen).  It will do the minimal work to keep your
index up to date.

There are significant differences between how you would set up a job
using one model vs. the other.  I strongly suggest you read at least
the first few chapters of the book.

Karl

On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya
<anupamb82@gmail.com> wrote:
> My incremental indexing was working previously but I have messed up with few
> settings due to which the documents indexed for the previous day gets
> deleted & only the new once shows up. I suspect that it is due to the
> settings in List all Job>Edit selected job>Scheduling>Schedule type: "Rescan
> documents dynamically" OR "Scan every document once" ? Please let me know
> the appropriate settings to index only the new documents in the repository.
>
> After deleting the SOLR indexes data folder and clearing the table records
> in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans only
> the rest new document list. Untill I go to List Output Connections and Click
> View for a SOLR connection and Click and Ok the Re-ingest all associated
> documents. How it is functioning to keep a track of which documents ingested
> previously and then fetch only the list of new document list ?
>
> Regards
> Anupam
>
>
> On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya <anupamb82@gmail.com>
> wrote:
>>
>> Thanks..
>>
>> There is a option to set Start Method in Connection tab in the Job
>> settings. I made to changes to "Start when the Schedule window starts" and
>> the problem got resolved.
>>
>> Regards
>> Anupam
>>
>>
>> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>> The incremental will work the same whether the job is run manually or
>>> started automatically.
>>>
>>> If you have added the appropriate schedule record to your job, you
>>> also have to select the "run job automatically" radio button on one of
>>> the other job tabs for automatic runs to take place.  I suspect that
>>> is what you are missing.
>>>
>>> Karl
>>>
>>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <anupamb82@gmail.com>
>>> wrote:
>>> > I have a Job which is indexing properly even the incremental indexing,
>>> > if
>>> > initiated/Run manually. Although even after adding a specific time to
>>> > Run
>>> > the schedular process the Jobs is not starting on its own.
>>> >
>>> > What is the ideal configuration to configure a Job which run
>>> > automatically
>>> > everyday at 12 am and does and incremental re-indexing (only look for
>>> > those
>>> > document which are new OR modified after the last crawl) of the
>>> > repository ?
>>> >
>>> > Is it necessary to input/give the total run time details for adding a
>>> > specific schedule time.
>>> >
>>> > Regards
>>> > Anupam
>>
>>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Mime
View raw message