manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Problem with continuous jobs deleting their documents on restart of Agent
Date Mon, 08 Oct 2012 16:58:34 GMT
I just tried this; the experiment yields no document deletions
recorded in the simple history (as expected).

So clearly there is a complicating factor somewhere that you will need to find.

I would suggest going about the basic process of eliminating
variables.  For example, try a continuous crawl in your environment
using the file system connector on a moderately-sized set of sample
documents, and see if it seems to do the same thing as the other
connectors you are using.  If it does, then that would suggest that
one of your modifications was in fact causing the problem.  If not,
then I should look at trying to repeat the experiment here with one of
the connectors you are working with.

Thanks,
Karl

On Mon, Oct 8, 2012 at 12:22 PM, Karl Wright <daddywri@gmail.com> wrote:
> There is no logic whatsoever in agents-shutdown that should delete
> documents from the queue and from the index, and I have never seen
> this behavior before, but this is really easy to verify.  It should be
> simple to take an unaltered 1.0 distribution, create a filesystem job
> on the multiprocess example, start it crawling continuously, then stop
> and restart the agents process, and then look at the simple history to
> see whether any documents get deleted or not.  I may have time to try
> this later in the evening, we'll see.
>
> Karl
>
> On Mon, Oct 8, 2012 at 12:06 PM, Martin Gielow <martin.gielow@gmail.com> wrote:
>> Hi Karl,
>>
>> thanks for the lightning-speed reply! :)
>>
>> On Mon, Oct 8, 2012 at 5:23 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>> Hi Martin,
>>>
>>> The behavior you describe is expected only if you are either deleting
>>> the job, or the job is set to expire old documents after a certain
>>> time interval (and that interval has transpired).
>>>
>>> Can you tell me what your expiration interval is?
>>>
>>
>> The expiration interval is set to 1440 (minutes, according to the
>> interface). I also just tried to leave the box empty, so that there should
>> be no expiration, but the behaviour remained the same.
>>
>>>
>>> Also, when you say "shutting down agents process", can you clarify
>>> what deployment model you are using?  How are you shutting down this
>>> process?
>>
>>
>> I am using a slightly modified version of the multiprocess-example with
>> postgres as the DBMS. To run and shutdown the agents I use the batch files
>> that are provided with the example (start-agents.bat and stop-agents.bat).
>> I have also tried to run the agents process from Eclipse to be able to debug
>> into it and was getting the same results.
>>
>>>
>>> Thanks,
>>> Karl
>>
>>
>> Regards,
>> Martin
>>
>>
>>>
>>>
>>> On Mon, Oct 8, 2012 at 11:18 AM, Martin Gielow <martin.gielow@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I'm using Manifold to crawl several data sources using the Wiki and the
>>> > JDBC
>>> > connectors. I have set the associated jobs to run continuously so that
>>> > new
>>> > documents will be added in a timely manner. The problem I am having with
>>> > this, is that whenever the Agent is stopped and then restarted, the jobs
>>> > will delete all of their documents (also propagating the deletes to the
>>> > associated output connection) before turning themselves inactive (which
>>> > they
>>> > shouldn't as they are set to run continuously).
>>> >
>>> > If I then restart the job, in case of the JDBC connection, it is not
>>> > finding
>>> > any previously added documents and will set itself inactive again. In
>>> > case
>>> > of the Wiki connection, the documents are also deleted, but are
>>> > successfully
>>> > reindexed when the job is restartet manually.
>>> >
>>> > The only way I found to prevent the jobs from deleting their items in
>>> > this
>>> > case, was to manually stop the affected jobs before the Agent is stopped
>>> > (using the abort option) and to restart them after the Agent has been
>>> > restarted.
>>> >
>>> >
>>> > I am using the 1.0 release of Manifold and couldn't find anything
>>> > regarding
>>> > this behaviour in either the documentation or the wiki.
>>> >
>>> > Is there an obvious flaw with my setup or something I may have missed in
>>> > the
>>> > configuration?
>>> >
>>> > Thanks in advance for any tips!
>>> >
>>> > Regards,
>>> > Martin
>>
>>

Mime
View raw message