manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Selvaraj <timo.selva...@gmail.com>
Subject Re: File system continuous crawl settings
Date Fri, 08 May 2015 12:40:10 GMT
Thank you Karl.

I am using the SearchBlox output connector. 

For a continous crawl of the file system, what is the ideal setting to index new documents,
remove deleted documents and update modified documents within a reasonable timeframe? 



> On May 8, 2015, at 6:18 AM, Karl Wright <daddywri@gmail.com> wrote:
> 
> I just tried your configuration here.  A deleted document in the file system was indeed
picked up as expected.
> 
> I did notice that your "expiration" setting is, essentially, cleaning out documents at
a rapid clip.  With this setting, documents will be expired before they are recrawled.  You
probably want one strategy or the other but not both.
> 
> As for why a deleted document is "stuck" in Processing: the only thing I can think of
is that the output connection you've chosen is having trouble deleting the document from the
index.  What output connector are you using?
> 
> Karl
> 
> 
> On Fri, May 8, 2015 at 4:36 AM, Timo Selvaraj <timo.selvaraj@gmail.com <mailto:timo.selvaraj@gmail.com>>
wrote:
> Hi,
> 
> We are testing the continuous crawl feature for file system connector on a small folder
to test if new documents are added to the folder, missing documents removed and modified documents
updated are handled by the continuous crawl job:
> 
> Here are the settings we use:
> 
> Schedule type:	Rescan documents dynamically
> Minimum recrawl interval:	5 minutes	Maximum recrawl interval:	10 minutes
> Expiration interval:	5 minutes	Reseed interval:	10 minutes
> No scheduled run times
> Maximum hop count for link type 'child':	Unlimited
> Hop count mode:	Delete unreachable documents
> 
> 
> Adding new documents seem to be getting picked up by the job however removal of a document
or update to a document are not being picked up.
> 
> Am I missing any settings for the deletions or updates? I do see the document that has
been removed is showing as Processing under Queue Status and others are showing as Waiting
for Processing.
> 
> Any idea what setting is missing for the deletes/updates to be recognized and re-indexed?
> 
> Thanks,
> Timo 
> 


Mime
View raw message