manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: The Schedulars are not starting automatically
Date Tue, 06 Nov 2012 19:22:03 GMT
On Tue, Nov 6, 2012 at 1:41 PM, Anupam Bhattacharya <anupamb82@gmail.com> wrote:
> Hi Karl,
>
> Unfortunately I currently don't have a copy of you book. Thus I am asking
> all architectural and configuration question.
> Can you please confirm that the by first option you mean "rescan
> dynamically" and second option is "scan document once" ?
>

Yes.

> Regarding my second question. From the List Output Connections if I click
> view for an existing SOLR connection and Click the Re-ingest all associated
> documents what changes occurs within ManifoldCF ? Does this action delete
> any record from existing tables ?
>

Yes, of course it does, otherwise it wouldn't do anything.  It removes
entries from the ingeststatus table.

Listen, you sound like you are working too hard to understand the
internals of the project.  It might really help you to understand how
to *use* the project instead.  That is why I pointed you at the book -
it describes how ManifoldCF works.  The first chapter is free.  I
suggest you try just reading it.

Karl

> Regards
> Anupam
>
>
> On Tue, Nov 6, 2012 at 11:46 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> Hi Anupam,
>>
>> I'm having difficulty understanding what you posted here, but I will
>> try to explain the difference between "rescan dynamically" and "scan
>> every document once".  You may find more help also in ManifoldCF in
>> Action, at http://www.manning.com/wright .
>>
>> The first option causes your job to run forever.  The job runs only in
>> the schedule windows allotted for it.  It periodically "discovers" new
>> documents, and (depending on the crawling model of the connector) may
>> check for existence or modification of an already-crawled document.
>> Each document has its own schedule for doing this.
>>
>> The second option is more likely to be what you want.  Each job
>> starts, runs, and completes, being sure to run only in the scheduling
>> windows you provide.  You then run it again, and again (or your job
>> schedule makes that happen).  It will do the minimal work to keep your
>> index up to date.
>>
>> There are significant differences between how you would set up a job
>> using one model vs. the other.  I strongly suggest you read at least
>> the first few chapters of the book.
>>
>> Karl
>>
>> On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya
>> <anupamb82@gmail.com> wrote:
>> > My incremental indexing was working previously but I have messed up with
>> > few
>> > settings due to which the documents indexed for the previous day gets
>> > deleted & only the new once shows up. I suspect that it is due to the
>> > settings in List all Job>Edit selected job>Scheduling>Schedule type:
>> > "Rescan
>> > documents dynamically" OR "Scan every document once" ? Please let me
>> > know
>> > the appropriate settings to index only the new documents in the
>> > repository.
>> >
>> > After deleting the SOLR indexes data folder and clearing the table
>> > records
>> > in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans
>> > only
>> > the rest new document list. Untill I go to List Output Connections and
>> > Click
>> > View for a SOLR connection and Click and Ok the Re-ingest all associated
>> > documents. How it is functioning to keep a track of which documents
>> > ingested
>> > previously and then fetch only the list of new document list ?
>> >
>> > Regards
>> > Anupam
>> >
>> >
>> > On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya
>> > <anupamb82@gmail.com>
>> > wrote:
>> >>
>> >> Thanks..
>> >>
>> >> There is a option to set Start Method in Connection tab in the Job
>> >> settings. I made to changes to "Start when the Schedule window starts"
>> >> and
>> >> the problem got resolved.
>> >>
>> >> Regards
>> >> Anupam
>> >>
>> >>
>> >> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <daddywri@gmail.com>
>> >> wrote:
>> >>>
>> >>> The incremental will work the same whether the job is run manually or
>> >>> started automatically.
>> >>>
>> >>> If you have added the appropriate schedule record to your job, you
>> >>> also have to select the "run job automatically" radio button on one
of
>> >>> the other job tabs for automatic runs to take place.  I suspect that
>> >>> is what you are missing.
>> >>>
>> >>> Karl
>> >>>
>> >>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya
>> >>> <anupamb82@gmail.com>
>> >>> wrote:
>> >>> > I have a Job which is indexing properly even the incremental
>> >>> > indexing,
>> >>> > if
>> >>> > initiated/Run manually. Although even after adding a specific time
>> >>> > to
>> >>> > Run
>> >>> > the schedular process the Jobs is not starting on its own.
>> >>> >
>> >>> > What is the ideal configuration to configure a Job which run
>> >>> > automatically
>> >>> > everyday at 12 am and does and incremental re-indexing (only look
>> >>> > for
>> >>> > those
>> >>> > document which are new OR modified after the last crawl) of the
>> >>> > repository ?
>> >>> >
>> >>> > Is it necessary to input/give the total run time details for adding
>> >>> > a
>> >>> > specific schedule time.
>> >>> >
>> >>> > Regards
>> >>> > Anupam
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Anupam Bhattacharya
>> >
>> >
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Mime
View raw message