manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anupam Bhattacharya <anupam...@gmail.com>
Subject Re: The Schedulars are not starting automatically
Date Tue, 06 Nov 2012 18:41:42 GMT
Hi Karl,

Unfortunately I currently don't have a copy of you book. Thus I am asking
all architectural and configuration question.
Can you please confirm that the by first option you mean "rescan
dynamically" and second option is "scan document once" ?

Regarding my second question. From the List Output Connections if I click
view for an existing SOLR connection and Click the Re-ingest all associated
documents what changes occurs within ManifoldCF ? Does this action delete
any record from existing tables ?

Regards
Anupam

On Tue, Nov 6, 2012 at 11:46 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Anupam,
>
> I'm having difficulty understanding what you posted here, but I will
> try to explain the difference between "rescan dynamically" and "scan
> every document once".  You may find more help also in ManifoldCF in
> Action, at http://www.manning.com/wright .
>
> The first option causes your job to run forever.  The job runs only in
> the schedule windows allotted for it.  It periodically "discovers" new
> documents, and (depending on the crawling model of the connector) may
> check for existence or modification of an already-crawled document.
> Each document has its own schedule for doing this.
>
> The second option is more likely to be what you want.  Each job
> starts, runs, and completes, being sure to run only in the scheduling
> windows you provide.  You then run it again, and again (or your job
> schedule makes that happen).  It will do the minimal work to keep your
> index up to date.
>
> There are significant differences between how you would set up a job
> using one model vs. the other.  I strongly suggest you read at least
> the first few chapters of the book.
>
> Karl
>
> On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya
> <anupamb82@gmail.com> wrote:
> > My incremental indexing was working previously but I have messed up with
> few
> > settings due to which the documents indexed for the previous day gets
> > deleted & only the new once shows up. I suspect that it is due to the
> > settings in List all Job>Edit selected job>Scheduling>Schedule type:
> "Rescan
> > documents dynamically" OR "Scan every document once" ? Please let me know
> > the appropriate settings to index only the new documents in the
> repository.
> >
> > After deleting the SOLR indexes data folder and clearing the table
> records
> > in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans only
> > the rest new document list. Untill I go to List Output Connections and
> Click
> > View for a SOLR connection and Click and Ok the Re-ingest all associated
> > documents. How it is functioning to keep a track of which documents
> ingested
> > previously and then fetch only the list of new document list ?
> >
> > Regards
> > Anupam
> >
> >
> > On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya <
> anupamb82@gmail.com>
> > wrote:
> >>
> >> Thanks..
> >>
> >> There is a option to set Start Method in Connection tab in the Job
> >> settings. I made to changes to "Start when the Schedule window starts"
> and
> >> the problem got resolved.
> >>
> >> Regards
> >> Anupam
> >>
> >>
> >> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <daddywri@gmail.com>
> wrote:
> >>>
> >>> The incremental will work the same whether the job is run manually or
> >>> started automatically.
> >>>
> >>> If you have added the appropriate schedule record to your job, you
> >>> also have to select the "run job automatically" radio button on one of
> >>> the other job tabs for automatic runs to take place.  I suspect that
> >>> is what you are missing.
> >>>
> >>> Karl
> >>>
> >>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <
> anupamb82@gmail.com>
> >>> wrote:
> >>> > I have a Job which is indexing properly even the incremental
> indexing,
> >>> > if
> >>> > initiated/Run manually. Although even after adding a specific time
to
> >>> > Run
> >>> > the schedular process the Jobs is not starting on its own.
> >>> >
> >>> > What is the ideal configuration to configure a Job which run
> >>> > automatically
> >>> > everyday at 12 am and does and incremental re-indexing (only look for
> >>> > those
> >>> > document which are new OR modified after the last crawl) of the
> >>> > repository ?
> >>> >
> >>> > Is it necessary to input/give the total run time details for adding
a
> >>> > specific schedule time.
> >>> >
> >>> > Regards
> >>> > Anupam
> >>
> >>
> >
> >
> >
> > --
> > Thanks & Regards
> > Anupam Bhattacharya
> >
> >
>



-- 
Thanks & Regards
Anupam Bhattacharya

Mime
View raw message