manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lalit jangra <lalit.j.jan...@gmail.com>
Subject Re: Getting errors in zookeeper logs
Date Mon, 15 Sep 2014 16:29:16 GMT
Thanks Karl,

I think this is the reason why my zookeeper nodes are resetting connection
due to instability. What i will try in the meantime is to reduce MCF memory
to 1.5G and leave rest unassigned so that will to 5.5 G for Java itself ,
more than 25% rule and see if it works.

I also checked out Zookeeper documentation but no specific inputs i could
take from it.

Regards.

On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Lalit,
>
> I can't speak for Solr's memory consumption, but you absolutely need to
> give Solr enough memory to avoid OOM errors or things will not work
> properly.
>
> As for MCF, 3G is more than enough; probably you could give it 1G and be
> fine.
>
> For Zookeeper, remember that it is a Java process.  On 64-bit unix
> machines, Java by default takes 25% of the total system memory.  I would
> look at their documentation to figure out what they need, and assign
> precisely that amount, otherwise zk will obviously not be stable.
>
> Thanks,
> Karl
>
>
> On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra <lalit.j.jangra@gmail.com>
> wrote:
>
>> Hi Karl,
>>
>> Out of 12G, i have assigned 5G to solr as i could see a lot of Out of
>> Memory errors/Java heap space issues while crawling large jobs,after which
>> it seems to be OK. Also i have assigned 3G to MCF where it is quire
>> comfortable. In rest of 4G, i am assuming is enough for OS & zookeeper
>> nodes. I am currently running job for 35K documents & i could see more than
>> 500MB memory free.
>>
>> Any thoughts?
>>
>> Regards.
>>
>> On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> HI Lalit,
>>>
>>> The best way in Java to assess memory usage is to turn on JVM garbage
>>> collection verbose output.  Then you can see how often the system garbage
>>> collects etc, and whether post-GC usage grows over time.
>>>
>>> 12G should be more than enough, so if you find you are running into
>>> memory limits with that configuration, it would be worth trying to figure
>>> out why that is happening.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra <lalit.j.jangra@gmail.com
>>> > wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> Can i see zookeeper connection reset messages due to system running on
>>>> top of memory limits as i have 12G of RAM and can see its using 11.5G while
>>>> job is running?
>>>>
>>>>
>>>> Is there any way i should ascertain memory to zookeeper nodes & if so,
>>>> is there any yardstick?
>>>>
>>>> Regards.
>>>>
>>>> On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Lalit,
>>>>>
>>>>> Looks like this is the result of a tomcat shutdown, and is a probable
>>>>> race condition bug in Zookeeper:
>>>>>
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3CBAY174-W32B2284BEDAE503E9D22D3A8850@phx.gbl%3E
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra <
>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>>
>>>>>> Along with this, i could see below errors in tomcat catalina.out.
>>>>>>
>>>>>> Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader
>>>>>> loadClass
>>>>>>
>>>>>> INFO: Illegal access: this web application instance has been stopped
>>>>>> already.  Could not load org.apache.zookeeper.server.ZooTrace.  The
>>>>>> eventual following stack trace is caused by an error thrown for debugging
>>>>>> purposes as well as to attempt to terminate the thread which caused
the
>>>>>> illegal access, and has no functional impact.
>>>>>>
>>>>>> java.lang.IllegalStateException
>>>>>>
>>>>>>         at
>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612)
>>>>>>
>>>>>>         at
>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>>>>>>
>>>>>>         at
>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115)
>>>>>>
>>>>>>
>>>>>>
>>>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] ERROR
>>>>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread(
>>>>>> iwdc2preecma04.iwater.ie:2183)
>>>>>>
>>>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace
>>>>>>
>>>>>>         at
>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115)
>>>>>>
>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>> org.apache.zookeeper.server.ZooTrace
>>>>>>
>>>>>>         at
>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)
>>>>>>
>>>>>>         at
>>>>>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>>>>>>
>>>>>>         ... 1 more
>>>>>>
>>>>>> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] ERROR
>>>>>> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread(
>>>>>> iwdc2preecma04.iwater.ie:2182)
>>>>>>
>>>>>> Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy
>>>>>>
>>>>>> INFO: Destroying ProtocolHandler ["http-bio-80"]
>>>>>>
>>>>>> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra <
>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Karl,
>>>>>>>
>>>>>>> While crawling is very slow, its taking long so a bit of frustrating
>>>>>>> and as i have multiple high volume jobs that too in parallel,
it does not
>>>>>>> seem to be a good thing.
>>>>>>>
>>>>>>> I have also raised it on Zookeeper forums @
>>>>>>> http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html
>>>>>>> but waiting for reply.
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>> On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> HI Lalit,
>>>>>>>>
>>>>>>>> When MCF cannot reach zookeeper, MCF crawls will pause until
the
>>>>>>>> zookeeper connections are reestablished.  Then the crawls
should resume.
>>>>>>>> This should *not* abort your crawls, but it will make them
very slow.
>>>>>>>>
>>>>>>>> I am not a zookeeper expert, so I would post on their message
>>>>>>>> boards to see if there is any adjustment that can be made
to zookeeper
>>>>>>>> parameters that would improve zookeeper behavior when you
have a flaky
>>>>>>>> network.  However, since the obvious solution is to fix your
network, they
>>>>>>>> may not have a code solution for you.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra <
>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Karl,
>>>>>>>>>
>>>>>>>>> Ideally resetting connections should be taken care by
zookeeper
>>>>>>>>> itself as i could see re-establishment of connections
later in logs.
>>>>>>>>>
>>>>>>>>> Can you suggest any way to overcome this in addition
to network
>>>>>>>>> issue resolution as my crawls are not working again and
again? Anything in
>>>>>>>>> config files etc.?
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Lalit,
>>>>>>>>>>
>>>>>>>>>> Zookeeper will keep working, but you should understand
that you
>>>>>>>>>> are dropping connections to your zookeeper members
for unknown reasons,
>>>>>>>>>> which is causing your crawl to stall when it happens.
 This argues that
>>>>>>>>>> perhaps you have some network flakiness of some kind.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra <
>>>>>>>>>> lalit.j.jangra@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am running cluster of two Apache ManifoldCF
nodes on two
>>>>>>>>>>> separate machines each of which having 3 zookeeper
instances (total 6
>>>>>>>>>>> instances in cluster). When i am running up manifoldCF
agents, i see below
>>>>>>>>>>> warning during startup.
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Unable
to read additional data from
>>>>>>>>>>> server sessionid 0x0, likely server has closed
socket, closing socket
>>>>>>>>>>> connection and attempting reconnect
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening
socket connection to server
>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will
not attempt to
>>>>>>>>>>> authenticate using SASL (unknown error)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Also i could see below error in logs in while
agents are running.
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper
-
>>>>>>>>>>> Initiating client connection,
>>>>>>>>>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183
>>>>>>>>>>> sessionTimeout=4000
>>>>>>>>>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening
socket connection to server
>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will
not attempt to
>>>>>>>>>>> authenticate using SASL (unknown error)
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket
connection established to
>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating
session
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)]
>>>>>>>>>>> WARN org.apache.zookeeper.ClientCnxn - Session
0x0 for server
>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected
error,
>>>>>>>>>>> closing socket connection and attempting reconnect
>>>>>>>>>>>
>>>>>>>>>>> java.io.IOException: Connection reset by peer
>>>>>>>>>>>
>>>>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
>>>>>>>>>>>
>>>>>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:193)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
>>>>>>>>>>>
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Opening
socket connection to server
>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will
not attempt to
>>>>>>>>>>> authenticate using SASL (unknown error)
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Socket
connection established to
>>>>>>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating
session
>>>>>>>>>>>
>>>>>>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)]
>>>>>>>>>>> INFO org.apache.zookeeper.ClientCnxn - Session
establishment complete on
>>>>>>>>>>> server iwdc2preecma04.iwater.ie/10.231.72.25:2183,
sessionid =
>>>>>>>>>>> 0x6487851bd330078, negotiated timeout = 4000
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Below are configurations for 1. zookeeper nodes
& 2. MCF nodes
>>>>>>>>>>> for zookeeper.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *zoo.cfg :  Same for all six zookeeper nodes.*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # The number of milliseconds of each tick
>>>>>>>>>>>
>>>>>>>>>>> tickTime=2000
>>>>>>>>>>>
>>>>>>>>>>> dataDir=/app/IW/zookeeper/data/data.1
>>>>>>>>>>>
>>>>>>>>>>> dataLogDir=/app/IW/zookeeper/logs/log.1
>>>>>>>>>>>
>>>>>>>>>>> clientPort=2181
>>>>>>>>>>>
>>>>>>>>>>> server.1=iwdc1preecma03:2888:3888
>>>>>>>>>>>
>>>>>>>>>>> server.2=iwdc1preecma03:2889:3889
>>>>>>>>>>>
>>>>>>>>>>> server.3=iwdc1preecma03:2890:3890
>>>>>>>>>>>
>>>>>>>>>>> server.4=iwdc2preecma04:2891:3891
>>>>>>>>>>>
>>>>>>>>>>> server.5=iwdc2preecma04:2892:3892
>>>>>>>>>>>
>>>>>>>>>>> server.6=iwdc2preecma04:2893:3893
>>>>>>>>>>>
>>>>>>>>>>> # The number of ticks that the initial
>>>>>>>>>>>
>>>>>>>>>>> # synchronization phase can take
>>>>>>>>>>>
>>>>>>>>>>> initLimit=10
>>>>>>>>>>>
>>>>>>>>>>> # The number of ticks that can pass between
>>>>>>>>>>>
>>>>>>>>>>> # sending a request and getting an acknowledgement
>>>>>>>>>>>
>>>>>>>>>>> syncLimit=5
>>>>>>>>>>>
>>>>>>>>>>> # the directory where the snapshot is stored.
>>>>>>>>>>>
>>>>>>>>>>> # do not use /tmp for storage, /tmp here is just
>>>>>>>>>>>
>>>>>>>>>>> # example sakes.
>>>>>>>>>>>
>>>>>>>>>>> #dataDir=/tmp/zookeeper
>>>>>>>>>>>
>>>>>>>>>>> # the port at which the clients will connect
>>>>>>>>>>>
>>>>>>>>>>> #clientPort=2181
>>>>>>>>>>>
>>>>>>>>>>> # the maximum number of client connections.
>>>>>>>>>>>
>>>>>>>>>>> # increase this if you need to handle more clients
>>>>>>>>>>>
>>>>>>>>>>> #maxClientCnxns=60
>>>>>>>>>>>
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>> # Be sure to read the maintenance section of
the
>>>>>>>>>>>
>>>>>>>>>>> # administrator guide before turning on autopurge.
>>>>>>>>>>>
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>> #
>>>>>>>>>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
>>>>>>>>>>>
>>>>>>>>>>> #
>>>>>>>>>>>
>>>>>>>>>>> # The number of snapshots to retain in dataDir
>>>>>>>>>>>
>>>>>>>>>>> autopurge.snapRetainCount=3
>>>>>>>>>>>
>>>>>>>>>>> # Purge task interval in hours
>>>>>>>>>>>
>>>>>>>>>>> # Set to "0" to disable auto purge feature
>>>>>>>>>>>
>>>>>>>>>>> autopurge.purgeInterval=1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *ManifoldCF configurations : same for both ManifoldCF
nodes.*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> <property name="org.apache.manifoldcf.lockmanagerclass"
>>>>>>>>>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/>
>>>>>>>>>>>
>>>>>>>>>>>   <property name="org.apache.manifoldcf.zookeeper.connectstring"
>>>>>>>>>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/>
>>>>>>>>>>>
>>>>>>>>>>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout"
>>>>>>>>>>> value="4000"/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *I want to know if due to above warnings/errors,
will zookeeper
>>>>>>>>>>> stop working or will zookeeper will work and
these are non-failing
>>>>>>>>>>> messages, because ManifoldCF jobs are stuck while
i can see these errors.*
>>>>>>>>>>>
>>>>>>>>>>> Please suggest.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Lalit.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Lalit.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Lalit.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Lalit.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Lalit.
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Lalit.
>>
>
>


-- 
Regards,
Lalit.

Mime
View raw message