manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: How documents are deleted
Date Wed, 24 Oct 2018 14:06:51 GMT
Hi Julien,

This is a complex question and the framework behaves differently depending
on the connector model.  Please read:


On Wed, Oct 24, 2018 at 5:26 AM Julien Massiera <> wrote:

> Hi Karl,
> I am trying to understand the behavior of ManifoldCF during a re-crawl
> and specially how missing documents are deleted and by which process ?
> I am focusing on two repository connectors, the JCIFS one and the JDBC
> one. Here is what I understand so far :
> In the JCIFS connector, the addSeedDocuments method list all the files
> found for each configured path. So it seems clear that any previously
> crawled files that have not been listed during a re-crawl by this method
> should be deleted.
> In the JDBC connector, the addSeedDocuments method only list the new or
> modified documents during a re-crawl (if, of course, the id query is
> correctly using the starttime and endtime variables). So here, there is
> a difference between the two connectors. It means that to delete missing
> documents, the previously crawled ones need to be 'checked' with the
> version query to detect the documents that must be removed.
> I am currently unable to tell what is really performed by ManifoldCF to
> deal with documents to delete and if any of the assumptions I exposed
> above are correct and/or used. Also, I am really interested to know
> which part of the code is performing the delete process.
> Thanks for your help.
> --
> Directeur développement produit
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message