Hi Timo,

Could you get a thread dump of the agents process?  If there is a stuck thread we need to figure out where it is stuck.


Sent from my Windows Phone

From: Timo Selvaraj
Sent: 5/8/2015 8:42 AM
To: user@manifoldcf.apache.org
Subject: Re: File system continuous crawl settings

Thank you Karl.

I am using the SearchBlox output connector. 

For a continous crawl of the file system, what is the ideal setting to index new documents, remove deleted documents and update modified documents within a reasonable timeframe? 

On May 8, 2015, at 6:18 AM, Karl Wright <daddywri@gmail.com> wrote:

I just tried your configuration here.  A deleted document in the file system was indeed picked up as expected.

I did notice that your "expiration" setting is, essentially, cleaning out documents at a rapid clip.  With this setting, documents will be expired before they are recrawled.  You probably want one strategy or the other but not both.

As for why a deleted document is "stuck" in Processing: the only thing I can think of is that the output connection you've chosen is having trouble deleting the document from the index.  What output connector are you using?


On Fri, May 8, 2015 at 4:36 AM, Timo Selvaraj <timo.selvaraj@gmail.com> wrote:

We are testing the continuous crawl feature for file system connector on a small folder to test if new documents are added to the folder, missing documents removed and modified documents updated are handled by the continuous crawl job:

Here are the settings we use:

Schedule type:Rescan documents dynamically
Minimum recrawl interval:5 minutesMaximum recrawl interval:10 minutes
Expiration interval:5 minutesReseed interval:10 minutes
No scheduled run times
Maximum hop count for link type 'child':Unlimited
Hop count mode:Delete unreachable documents

Adding new documents seem to be getting picked up by the job however removal of a document or update to a document are not being picked up.

Am I missing any settings for the deletions or updates? I do see the document that has been removed is showing as Processing under Queue Status and others are showing as Waiting for Processing.

Any idea what setting is missing for the deletes/updates to be recognized and re-indexed?