Thanks Karl,

I have updated it accordingly and will retest it for same.

Regards.

On Tue, Sep 16, 2014 at 7:37 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Lalit,

Please see my email about increasing the value of the tick interval significantly.  I think this will help a lot.  There are still issues that I need to deal with, but you may be able to succeed in the interim with that one change.

Hopefully I'll also have a code fix as well, but that may take longer.

Thanks,
Karl


On Tue, Sep 16, 2014 at 6:33 AM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Sorry Karl,

Its a typo actual values are export JVMFLAGS="-Xms1024m -Xmx1024m".

Regards.

On Tue, Sep 16, 2014 at 3:50 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Lalit,

I believe there is no space between -Xmx and 1024m:  "-Xmx1024m".  Same with -Xms.

Karl


On Tue, Sep 16, 2014 at 4:25 AM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Greetings,

I updated zookeeper java heap settings by adding java/.conf under zookeeper/conf folder and added below line to all six zookeeper nodes and restarted.

export JVMFLAGS="-Xms 1024m -Xmx 1024m"

Still i can see zookeeper connection reset while starting agent and my crawls is stuck.

Please suggest. Is there any way to read into zookeeper logs as these are in binary format.

Regards.



On Mon, Sep 15, 2014 at 11:58 PM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Thanks Karl,

I am running zookeepers using zkServer.sh script file and i will try with your suggestions.

Regards.

On Mon, Sep 15, 2014 at 10:48 PM, Karl Wright <daddywri@gmail.com> wrote:
If you are running a batch/shell script to start zookeeper, have a look at the script you are running.  I am sure there is a way to include an environment variable that controls the amount of memory, or at least Java options.  The java option you'd include would be something like: -Xmx500m  (for 500 megabytes), or -Xmx1g (for 1 gigabyte), etc.

Karl


On Mon, Sep 15, 2014 at 1:16 PM, Karl Wright <daddywri@gmail.com> wrote:
How are you starting your zookeeper instances?
Karl


On Mon, Sep 15, 2014 at 1:14 PM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Thanks Karl,

After updated configurations, still i am hitting same zookeeper connection reset issue.

I was trying to assign memory to zookeeper instances but i could not see any way to do same. Can you suggest any way?

What else i can do?


Regards.

On Mon, Sep 15, 2014 at 10:39 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Lalit,

If you have more than one unspecified Java process, EACH ONE will allocate 25% of available memory by default.  So you will have to do more than just free up some MCF memory to get this to work.

Karl


On Mon, Sep 15, 2014 at 12:29 PM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Thanks Karl,

I think this is the reason why my zookeeper nodes are resetting connection due to instability. What i will try in the meantime is to reduce MCF memory to 1.5G and leave rest unassigned so that will to 5.5 G for Java itself , more than 25% rule and see if it works.

I also checked out Zookeeper documentation but no specific inputs i could take from it.

Regards.

On Mon, Sep 15, 2014 at 9:52 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Lalit,

I can't speak for Solr's memory consumption, but you absolutely need to give Solr enough memory to avoid OOM errors or things will not work properly.

As for MCF, 3G is more than enough; probably you could give it 1G and be fine.

For Zookeeper, remember that it is a Java process.  On 64-bit unix machines, Java by default takes 25% of the total system memory.  I would look at their documentation to figure out what they need, and assign precisely that amount, otherwise zk will obviously not be stable.

Thanks,
Karl


On Mon, Sep 15, 2014 at 12:17 PM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Hi Karl,

Out of 12G, i have assigned 5G to solr as i could see a lot of Out of Memory errors/Java heap space issues while crawling large jobs,after which it seems to be OK. Also i have assigned 3G to MCF where it is quire comfortable. In rest of 4G, i am assuming is enough for OS & zookeeper nodes. I am currently running job for 35K documents & i could see more than 500MB memory free.

Any thoughts?

Regards.

On Mon, Sep 15, 2014 at 8:45 PM, Karl Wright <daddywri@gmail.com> wrote:
HI Lalit,

The best way in Java to assess memory usage is to turn on JVM garbage collection verbose output.  Then you can see how often the system garbage collects etc, and whether post-GC usage grows over time.

12G should be more than enough, so if you find you are running into memory limits with that configuration, it would be worth trying to figure out why that is happening.

Karl


On Mon, Sep 15, 2014 at 11:04 AM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Hi Karl,

Can i see zookeeper connection reset messages due to system running on top of memory limits as i have 12G of RAM and can see its using 11.5G while job is running?


Is there any way i should ascertain memory to zookeeper nodes & if so, is there any yardstick?

Regards.

On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Lalit,

Looks like this is the result of a tomcat shutdown, and is a probable race condition bug in Zookeeper:

http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3CBAY174-W32B2284BEDAE503E9D22D3A8850@phx.gbl%3E

Karl


On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Hi Karl,

Along with this, i could see below errors in tomcat catalina.out.

Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader loadClass

INFO: Illegal access: this web application instance has been stopped already.  Could not load org.apache.zookeeper.server.ZooTrace.  The eventual following stack trace is caused by an error thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access, and has no functional impact.

java.lang.IllegalStateException

        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612)

        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115)

 

[http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] ERROR org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)

java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115)

Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.server.ZooTrace

        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)

        at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)

        ... 1 more

[http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] ERROR org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)

Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy

INFO: Destroying ProtocolHandler ["http-bio-80"]

java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace


Regards.

On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Thanks Karl,

While crawling is very slow, its taking long so a bit of frustrating and as i have multiple high volume jobs that too in parallel, it does not seem to be a good thing.

I have also raised it on Zookeeper forums @ http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html but waiting for reply.

Regards.

On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <daddywri@gmail.com> wrote:
HI Lalit,

When MCF cannot reach zookeeper, MCF crawls will pause until the zookeeper connections are reestablished.  Then the crawls should resume.  This should *not* abort your crawls, but it will make them very slow.

I am not a zookeeper expert, so I would post on their message boards to see if there is any adjustment that can be made to zookeeper parameters that would improve zookeeper behavior when you have a flaky network.  However, since the obvious solution is to fix your network, they may not have a code solution for you.

Thanks,
Karl


On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra <lalit.j.jangra@gmail.com> wrote:
Thanks Karl,

Ideally resetting connections should be taken care by zookeeper itself as i could see re-establishment of connections later in logs.

Can you suggest any way to overcome this in addition to network issue resolution as my crawls are not working again and again? Anything in config files etc.?

Regards.


On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Lalit,

Zookeeper will keep working, but you should understand that you are dropping connections to your zookeeper members for unknown reasons, which is causing your crawl to stall when it happens.  This argues that perhaps you have some network flakiness of some kind.

Karl


On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra <lalit.j.jangra@gmail.com> wrote:

Hi,

I am running cluster of two Apache ManifoldCF nodes on two separate machines each of which having 3 zookeeper instances (total 6 instances in cluster). When i am running up manifoldCF agents, i see below warning during startup.

[http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to authenticate using SASL (unknown error)



Also i could see below error in logs in while agents are running.

[http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 sessionTimeout=4000 watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to authenticate using SASL (unknown error)

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)

        at sun.nio.ch.IOUtil.read(IOUtil.java:193)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)

        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)

        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt to authenticate using SASL (unknown error)

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session

[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid = 0x6487851bd330078, negotiated timeout = 4000


Below are configurations for 1. zookeeper nodes & 2. MCF nodes for zookeeper.


zoo.cfg :  Same for all six zookeeper nodes.


# The number of milliseconds of each tick

tickTime=2000

dataDir=/app/IW/zookeeper/data/data.1

dataLogDir=/app/IW/zookeeper/logs/log.1

clientPort=2181

server.1=iwdc1preecma03:2888:3888

server.2=iwdc1preecma03:2889:3889

server.3=iwdc1preecma03:2890:3890

server.4=iwdc2preecma04:2891:3891

server.5=iwdc2preecma04:2892:3892

server.6=iwdc2preecma04:2893:3893

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

#dataDir=/tmp/zookeeper

# the port at which the clients will connect

#clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

#

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

#

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in dataDir

autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

autopurge.purgeInterval=1


ManifoldCF configurations : same for both ManifoldCF nodes.


<property name="org.apache.manifoldcf.lockmanagerclass" value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/>

  <property name="org.apache.manifoldcf.zookeeper.connectstring" value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/>

<property name="org.apache.manifoldcf.zookeeper.sessiontimeout" value="4000"/>


I want to know if due to above warnings/errors, will zookeeper stop working or will zookeeper will work and these are non-failing messages, because ManifoldCF jobs are stuck while i can see these errors.

Please suggest.

Regards,
Lalit.





--
Regards,
Lalit.




--
Regards,
Lalit.



--
Regards,
Lalit.




--
Regards,
Lalit.




--
Regards,
Lalit.




--
Regards,
Lalit.




--
Regards,
Lalit.





--
Regards,
Lalit.



--
Regards,
Lalit.




--
Regards,
Lalit.




--
Regards,
Lalit.