manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Problem with continuous jobs deleting their documents on restart of Agent
Date Mon, 08 Oct 2012 16:22:52 GMT
There is no logic whatsoever in agents-shutdown that should delete
documents from the queue and from the index, and I have never seen
this behavior before, but this is really easy to verify.  It should be
simple to take an unaltered 1.0 distribution, create a filesystem job
on the multiprocess example, start it crawling continuously, then stop
and restart the agents process, and then look at the simple history to
see whether any documents get deleted or not.  I may have time to try
this later in the evening, we'll see.

Karl

On Mon, Oct 8, 2012 at 12:06 PM, Martin Gielow <martin.gielow@gmail.com> wrote:
> Hi Karl,
>
> thanks for the lightning-speed reply! :)
>
> On Mon, Oct 8, 2012 at 5:23 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> Hi Martin,
>>
>> The behavior you describe is expected only if you are either deleting
>> the job, or the job is set to expire old documents after a certain
>> time interval (and that interval has transpired).
>>
>> Can you tell me what your expiration interval is?
>>
>
> The expiration interval is set to 1440 (minutes, according to the
> interface). I also just tried to leave the box empty, so that there should
> be no expiration, but the behaviour remained the same.
>
>>
>> Also, when you say "shutting down agents process", can you clarify
>> what deployment model you are using?  How are you shutting down this
>> process?
>
>
> I am using a slightly modified version of the multiprocess-example with
> postgres as the DBMS. To run and shutdown the agents I use the batch files
> that are provided with the example (start-agents.bat and stop-agents.bat).
> I have also tried to run the agents process from Eclipse to be able to debug
> into it and was getting the same results.
>
>>
>> Thanks,
>> Karl
>
>
> Regards,
> Martin
>
>
>>
>>
>> On Mon, Oct 8, 2012 at 11:18 AM, Martin Gielow <martin.gielow@gmail.com>
>> wrote:
>> > Hello,
>> >
>> > I'm using Manifold to crawl several data sources using the Wiki and the
>> > JDBC
>> > connectors. I have set the associated jobs to run continuously so that
>> > new
>> > documents will be added in a timely manner. The problem I am having with
>> > this, is that whenever the Agent is stopped and then restarted, the jobs
>> > will delete all of their documents (also propagating the deletes to the
>> > associated output connection) before turning themselves inactive (which
>> > they
>> > shouldn't as they are set to run continuously).
>> >
>> > If I then restart the job, in case of the JDBC connection, it is not
>> > finding
>> > any previously added documents and will set itself inactive again. In
>> > case
>> > of the Wiki connection, the documents are also deleted, but are
>> > successfully
>> > reindexed when the job is restartet manually.
>> >
>> > The only way I found to prevent the jobs from deleting their items in
>> > this
>> > case, was to manually stop the affected jobs before the Agent is stopped
>> > (using the abort option) and to restart them after the Agent has been
>> > restarted.
>> >
>> >
>> > I am using the 1.0 release of Manifold and couldn't find anything
>> > regarding
>> > this behaviour in either the documentation or the wiki.
>> >
>> > Is there an obvious flaw with my setup or something I may have missed in
>> > the
>> > configuration?
>> >
>> > Thanks in advance for any tips!
>> >
>> > Regards,
>> > Martin
>
>

Mime
View raw message