manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Manifold RSS connector gets "stuck" after a few docs are processed
Date Thu, 08 Mar 2018 14:54:15 GMT
I've reviewed all changes to the RSS connector and to the framework over
the last year, and none of them could reasonably have been expected to have
any kind of effect like this.  The only things changed were the redirect
strategy and updating to the latest Postgresql JDBC driver.

If the problem doesn't occur in the single-process example, the next
question is: do you have a multiprocess setup?  If so, try the multiprocess
example and see if that succeeds.  If it does, the problem is how we work
with Postgresql.

Karl


On Thu, Mar 8, 2018 at 9:41 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Mike,
>
> You are the third person this morning that has reported this in
> conjunction with Postgresql.  It is possible that some behavior we count on
> broke in the latest postgresql release.  Can you tell me what version you
> are using?  Do you see the same behavior when you run with the built-in
> HSQLDB example?
>
> Karl
>
>
> On Thu, Mar 8, 2018 at 9:32 AM, Mike Hugo <mike@piragua.com> wrote:
>
>> Hello,
>>
>> I set up a new manifold instance based on the simple example.  I modified
>> properties.xml to point to a postgresql database and then set it up to read
>> an RSS feed.  It uses a custom output connector to send the data to a
>> custom API.
>>
>> I've noticed that it starts properly, but it only pulls in 3 or 4 records
>> before it "hangs" and doesn't pull in more docs after that.  If I bounce
>> the server then it will pull in 3 or 4 more docs, but then seems to hang
>> again.
>>
>> I can add a new RSS feed and start it, but it won't pull in any documents
>> until the server is bounced.
>>
>> I increased the value of org.apache.manifoldcf.crawler.threads and that
>> seems to help, but it just delays the same behavior.  For example, it might
>> pull in 10 or 15 docs, but then stops pulling them in again.  No messages
>> in the logs.
>>
>> It does appear that it's spawning many many of these threads:
>> ExecuteQueryThread
>>
>> Any ideas where to start looking or how to debug why it hangs after only
>> a few documents?
>>
>> Thanks!!
>>
>> Mike
>>
>
>

Mime
View raw message