manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Massiera <julien.massi...@francelabs.com>
Subject Re: How documents are deleted
Date Thu, 25 Oct 2018 09:16:16 GMT
Hi Karl,

thanks for your response, I found in the documentation what I need.

Julien

On 24/10/2018 16:06, Karl Wright wrote:
> Hi Julien,
>
> This is a complex question and the framework behaves differently depending
> on the connector model.  Please read:
>
> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
>
> Karl
>
>
> On Wed, Oct 24, 2018 at 5:26 AM Julien Massiera <
> julien.massiera@francelabs.com> wrote:
>
>> Hi Karl,
>>
>> I am trying to understand the behavior of ManifoldCF during a re-crawl
>> and specially how missing documents are deleted and by which process ?
>>
>> I am focusing on two repository connectors, the JCIFS one and the JDBC
>> one. Here is what I understand so far :
>>
>> In the JCIFS connector, the addSeedDocuments method list all the files
>> found for each configured path. So it seems clear that any previously
>> crawled files that have not been listed during a re-crawl by this method
>> should be deleted.
>>
>> In the JDBC connector, the addSeedDocuments method only list the new or
>> modified documents during a re-crawl (if, of course, the id query is
>> correctly using the starttime and endtime variables). So here, there is
>> a difference between the two connectors. It means that to delete missing
>> documents, the previously crawled ones need to be 'checked' with the
>> version query to detect the documents that must be removed.
>>
>> I am currently unable to tell what is really performed by ManifoldCF to
>> deal with documents to delete and if any of the assumptions I exposed
>> above are correct and/or used. Also, I am really interested to know
>> which part of the code is performing the delete process.
>>
>> Thanks for your help.
>>
>> --
>> Julien MASSIERA
>> Directeur développement produit
>> France Labs – Les experts du Search
>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
>> www.francelabs.com
>>
>>
-- 
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com


Mime
View raw message