manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lalit jangra <lalit.j.jan...@gmail.com>
Subject Re: Getting errors in zookeeper logs
Date Mon, 15 Sep 2014 16:17:12 GMT
Hi Karl,

Out of 12G, i have assigned 5G to solr as i could see a lot of Out of
Memory errors/Java heap space issues while crawling large jobs,after which
it seems to be OK. Also i have assigned 3G to MCF where it is quire
comfortable. In rest of 4G, i am assuming is enough for OS & zookeeper
nodes. I am currently running job for 35K documents & i could see more than
500MB memory free.

Any thoughts?

Regards.

On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright <daddywri@gmail.com> wrote:

> HI Lalit,
>
> The best way in Java to assess memory usage is to turn on JVM garbage
> collection verbose output.  Then you can see how often the system garbage
> collects etc, and whether post-GC usage grows over time.
>
> 12G should be more than enough, so if you find you are running into memory
> limits with that configuration, it would be worth trying to figure out why
> that is happening.
>
> Karl
>
>
> On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra <lalit.j.jangra@gmail.com>
> wrote:
>
>> Hi Karl,
>>
>> Can i see zookeeper connection reset messages due to system running on
>> top of memory limits as i have 12G of RAM and can see its using 11.5G while
>> job is running?
>>
>>
>> Is there any way i should ascertain memory to zookeeper nodes & if so, is
>> there any yardstick?
>>
>> Regards.
>>
>> On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Lalit,
>>>
>>> Looks like this is the result of a tomcat shutdown, and is a probable
>>> race condition bug in Zookeeper:
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3CBAY174-W32B2284BEDAE503E9D22D3A8850@phx.gbl%3E
>>>
>>> Karl
>>>
>>>
>>> On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra <lalit.j.jangra@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> Along with this, i could see below errors in tomcat catalina.out.
>>>>
>>>> Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader
>>>> loadClass
>>>>
>>>> INFO: Illegal access: this web application instance has been stopped
>>>> already.  Could not load org.apache.zookeeper.server.ZooTrace.  The
>>>> eventual following stack trace is caused by an error thrown for debugging
>>>> purposes as well as to attempt to terminate the thread which caused the
>>>> illegal access, and has no functional impact.
>>>>
>>>> java.lang.IllegalStateException
>>>>
>>>>         at
>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612)
>>>>
>>>>         at
>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>>>>
>>>>         at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115)
>>>>
>>>>
>>>>
>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] ERROR
>>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread(
>>>> iwdc2preecma04.iwater.ie:2183)
>>>>
>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace
>>>>
>>>>         at
>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115)
>>>>
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.apache.zookeeper.server.ZooTrace
>>>>
>>>>         at
>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)
>>>>
>>>>         at
>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>>>>
>>>>         ... 1 more
>>>>
>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] ERROR
>>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread(
>>>> iwdc2preecma04.iwater.ie:2182)
>>>>
>>>> Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy
>>>>
>>>> INFO: Destroying ProtocolHandler ["http-bio-80"]
>>>>
>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace
>>>>
>>>> Regards.
>>>>
>>>> On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra <lalit.j.jangra@gmail.com
>>>> > wrote:
>>>>
>>>>> Thanks Karl,
>>>>>
>>>>> While crawling is very slow, its taking long so a bit of frustrating
>>>>> and as i have multiple high volume jobs that too in parallel, it does
not
>>>>> seem to be a good thing.
>>>>>
>>>>> I have also raised it on Zookeeper forums @
>>>>> http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html
>>>>> but waiting for reply.
>>>>>
>>>>> Regards.
>>>>>
>>>>> On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> HI Lalit,
>>>>>>
>>>>>> When MCF cannot reach zookeeper, MCF crawls will pause until the
>>>>>> zookeeper connections are reestablished.  Then the crawls should
resume.
>>>>>> This should *not* abort your crawls, but it will make them very slow.
>>>>>>
>>>>>> I am not a zookeeper expert, so I would post on their message boards
>>>>>> to see if there is any adjustment that can be made to zookeeper parameters
>>>>>> that would improve zookeeper behavior when you have a flaky network.
>>>>>> However, since the obvious solution is to fix your network, they
may not
>>>>>> have a code solution for you.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra <
>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Karl,
>>>>>>>
>>>>>>> Ideally resetting connections should be taken care by zookeeper
>>>>>>> itself as i could see re-establishment of connections later in
logs.
>>>>>>>
>>>>>>> Can you suggest any way to overcome this in addition to network
>>>>>>> issue resolution as my crawls are not working again and again?
Anything in
>>>>>>> config files etc.?
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Lalit,
>>>>>>>>
>>>>>>>> Zookeeper will keep working, but you should understand that
you are
>>>>>>>> dropping connections to your zookeeper members for unknown
reasons, which
>>>>>>>> is causing your crawl to stall when it happens.  This argues
that perhaps
>>>>>>>> you have some network flakiness of some kind.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra <
>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am running cluster of two Apache ManifoldCF nodes on
two
>>>>>>>>> separate machines each of which having 3 zookeeper instances
(total 6
>>>>>>>>> instances in cluster). When i am running up manifoldCF
agents, i see below
>>>>>>>>> warning during startup.
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Unable to read
additional data from
>>>>>>>>> server sessionid 0x0, likely server has closed socket,
closing socket
>>>>>>>>> connection and attempting reconnect
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket
connection to server
>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not
attempt to
>>>>>>>>> authenticate using SASL (unknown error)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also i could see below error in logs in while agents
are running.
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper
-
>>>>>>>>> Initiating client connection,
>>>>>>>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183
>>>>>>>>> sessionTimeout=4000
>>>>>>>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket
connection to server
>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not
attempt to
>>>>>>>>> authenticate using SASL (unknown error)
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection
established to
>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating
session
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>> WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for
server
>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected
error,
>>>>>>>>> closing socket connection and attempting reconnect
>>>>>>>>>
>>>>>>>>> java.io.IOException: Connection reset by peer
>>>>>>>>>
>>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>>>>>
>>>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
>>>>>>>>>
>>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:193)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>>>>>>>>>
>>>>>>>>>         at
>>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening socket
connection to server
>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not
attempt to
>>>>>>>>> authenticate using SASL (unknown error)
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket connection
established to
>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating
session
>>>>>>>>>
>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)]
>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Session establishment
complete on
>>>>>>>>> server iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid
=
>>>>>>>>> 0x6487851bd330078, negotiated timeout = 4000
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Below are configurations for 1. zookeeper nodes &
2. MCF nodes for
>>>>>>>>> zookeeper.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *zoo.cfg :  Same for all six zookeeper nodes.*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> # The number of milliseconds of each tick
>>>>>>>>>
>>>>>>>>> tickTime=2000
>>>>>>>>>
>>>>>>>>> dataDir=/app/IW/zookeeper/data/data.1
>>>>>>>>>
>>>>>>>>> dataLogDir=/app/IW/zookeeper/logs/log.1
>>>>>>>>>
>>>>>>>>> clientPort=2181
>>>>>>>>>
>>>>>>>>> server.1=iwdc1preecma03:2888:3888
>>>>>>>>>
>>>>>>>>> server.2=iwdc1preecma03:2889:3889
>>>>>>>>>
>>>>>>>>> server.3=iwdc1preecma03:2890:3890
>>>>>>>>>
>>>>>>>>> server.4=iwdc2preecma04:2891:3891
>>>>>>>>>
>>>>>>>>> server.5=iwdc2preecma04:2892:3892
>>>>>>>>>
>>>>>>>>> server.6=iwdc2preecma04:2893:3893
>>>>>>>>>
>>>>>>>>> # The number of ticks that the initial
>>>>>>>>>
>>>>>>>>> # synchronization phase can take
>>>>>>>>>
>>>>>>>>> initLimit=10
>>>>>>>>>
>>>>>>>>> # The number of ticks that can pass between
>>>>>>>>>
>>>>>>>>> # sending a request and getting an acknowledgement
>>>>>>>>>
>>>>>>>>> syncLimit=5
>>>>>>>>>
>>>>>>>>> # the directory where the snapshot is stored.
>>>>>>>>>
>>>>>>>>> # do not use /tmp for storage, /tmp here is just
>>>>>>>>>
>>>>>>>>> # example sakes.
>>>>>>>>>
>>>>>>>>> #dataDir=/tmp/zookeeper
>>>>>>>>>
>>>>>>>>> # the port at which the clients will connect
>>>>>>>>>
>>>>>>>>> #clientPort=2181
>>>>>>>>>
>>>>>>>>> # the maximum number of client connections.
>>>>>>>>>
>>>>>>>>> # increase this if you need to handle more clients
>>>>>>>>>
>>>>>>>>> #maxClientCnxns=60
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>>
>>>>>>>>> # Be sure to read the maintenance section of the
>>>>>>>>>
>>>>>>>>> # administrator guide before turning on autopurge.
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>>>>>>>>
>>>>>>>>> #
>>>>>>>>>
>>>>>>>>> # The number of snapshots to retain in dataDir
>>>>>>>>>
>>>>>>>>> autopurge.snapRetainCount=3
>>>>>>>>>
>>>>>>>>> # Purge task interval in hours
>>>>>>>>>
>>>>>>>>> # Set to "0" to disable auto purge feature
>>>>>>>>>
>>>>>>>>> autopurge.purgeInterval=1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *ManifoldCF configurations : same for both ManifoldCF
nodes.*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <property name="org.apache.manifoldcf.lockmanagerclass"
>>>>>>>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/>
>>>>>>>>>
>>>>>>>>>   <property name="org.apache.manifoldcf.zookeeper.connectstring"
>>>>>>>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/>
>>>>>>>>>
>>>>>>>>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout"
>>>>>>>>> value="4000"/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *I want to know if due to above warnings/errors, will
zookeeper
>>>>>>>>> stop working or will zookeeper will work and these are
non-failing
>>>>>>>>> messages, because ManifoldCF jobs are stuck while i can
see these errors.*
>>>>>>>>>
>>>>>>>>> Please suggest.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Lalit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Lalit.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Lalit.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Lalit.
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Lalit.
>>
>
>


-- 
Regards,
Lalit.

Mime
View raw message