manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Web Repository Deletion Policy
Date Thu, 29 Aug 2013 16:02:25 GMT
Hi Jim,

The minimal job run will delete documents that it discovers during the run
which it cannot load during the run.  It will not delete unreachable
documents, but that is not your case I think.

Karl



On Thu, Aug 29, 2013 at 11:41 AM, jim switzer <mojoswitz@gmail.com> wrote:

> How does the web repository connector decide when to delete documents?
>
> I ran a job yesterday, and crawled/processed ~250 documents.  This
> morning, I did a 'start minimal' on the job, and it proceeded to
> delete all the documents it crawled yesterday.  The site appears to
> have been experiencing issues when I restarted the job, but I was
> surprised to see so much content deleted after one failed job run.
>
> Here is the 'Simple History' from the job:
>
> <lots more document deletion messages>
> 08-29-2013 08:15:18.407 document deletion (LocalFiles)
> http://beta.blah.com:42541/hr/Pages/Reward-Recog...nition.aspx OK 0 1
> 08-29-2013 08:15:18.406 document deletion (LocalFiles)
>
> http://beta.blah.com:42541/IT/Documents/it_polic...yprocedure_internet_intranet_policy_100112.pdf
> OK 0 1
> 08-29-2013 08:15:16.268 fetch http://beta.blah.com:42541/_layouts/ 403 0
> 44
> 08-29-2013 08:15:11.267 fetch http://beta.blah.com:42541/ 302 178 149
> 08-29-2013 08:15:11.001 document deletion (LocalFiles)
> http://beta.blah.com:42541/Pages/Home.aspx OK 0 1
> 08-29-2013 08:12:28.911 fetch
> http://beta.blah.com:42541/Pages/Home.aspx 200 3376 162081
> 08-29-2013 08:12:27.365 job start 1377726689367(BetaCrawl) 0 1
>

Mime
View raw message