manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Zookeeper configured MCF not working in production mode
Date Tue, 16 Sep 2014 13:15:53 GMT
After some research, I found that increasing the zookeeper.cfg tick time
count from 2000 to 5000 makes this problem go away for me.

Clearly we have an issue, still, with resetting zookeeper connections after
tick timeout failures.  The connections are reset but the state of the
connections are somehow incorrect.  I'll need to do more research to figure
out how this can be addressed.

For the interim, increasing the tick time seems to be a reasonable
workaround.

Thanks,
Karl


On Tue, Sep 16, 2014 at 8:14 AM, Karl Wright <daddywri@gmail.com> wrote:

> Believe it or not, I was able to reproduce this here with a crawl of
> 100000 documents.  I get this in the Zookeeper server-side log, hundreds of
> times:
>
> >>>>>>
> [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn -
> Unexpected Exce
> ption:
> java.nio.channels.CancelledKeyException
>         at
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>         at
> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
> va:153)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
> java:1076)
>         at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
> lRequestProcessor.java:170)
>         at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
> cessor.java:167)
>         at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
> ssor.java:101)
> [SyncThread:0] ERROR org.apache.zookeeper.server.NIOServerCnxn -
> Unexpected Exce
> ption:
> java.nio.channels.CancelledKeyException
>         at
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
>         at
> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
> va:153)
>         at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
> java:1076)
>         at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Fina
> lRequestProcessor.java:170)
>         at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestPro
> cessor.java:167)
>         at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProce
> ssor.java:101)
> <<<<<<
>
> ... and then everything locks up.  I have no idea what is happening; seems
> to be an NIO exception ZooKeeper is not expecting.
>
> Karl
>
>
> On Tue, Sep 16, 2014 at 7:52 AM, Erlend GarĂ¥sen <e.f.garasen@usit.uio.no>
> wrote:
>
>>
>> Ouch, I forgot to place the Zookeeper logs on web. Since they do not
>> include timestamps and I have restarted MCF after a few changes, I guess it
>> will be difficult to get the relevant lines. I'll do that next time it
>> hangs, probably in the end of the day.
>>
>> I will add the new Zookeeper configuration settings as Lalit suggested
>> next time I'm restarting MCF.
>>
>>  How many worker threads are you using?  How many documents (about) do
>>> you crawl before things hang?
>>>
>>
>> Throttling -> max connections: 30
>> Throttling -> Max fetches/min: 100
>> Bandwith -> max connections: 25
>> Bandwith -> max kbytes/sec: 8000
>> Bandwith -> max fetches/min: 20
>>
>> I have four jobs configured. The one I'm running now has 100,000
>> documents configured. Totally around 110,000 documents for all four jobs.
>>
>> I guess there are more documents involved since the largest job excludes
>> a lot of documents based on sophisticated and complex filtering rules.
>> Maybe 50% more even though they are not added to Solr (but they are of
>> course fetched).
>>
>> Erlend
>>
>>
>>> You may also want to try to increase the parameter: maxClientCnxns in
>>> zookeeper.cfg to something bigger, if you have a lot of worker threads.
>>> I'm thinking 1000 or some such.  See if it makes a difference for you.
>>>
>>
>> I'll try that at next restart.
>>
>> Erlend
>>
>
>

Mime
View raw message