lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From OSMAN Metin <Metin.OS...@canal-plus.com>
Subject RE: Zookeeper will not update cluster state when garbaging
Date Mon, 10 Mar 2014 14:51:10 GMT
Merhaba Furkan,

We are planning to migrate to 3 nodes in an ensemble, but by now we have only one active zookeeper
instance in production.

Actually, I thought about a param somewhere in Solr configuration. I may be wrong but I thought
that the problem was due to the fact that Solr asks or tells zookeeper to update its states,
but it cannot as it is busy garbaging its memory. Nevertheless, I will try modifying the tickTime
param.

For the second point, I will ask my boss if I can add our company to your wiki.

Metin

-----Message d'origine-----
De : Furkan KAMACI [mailto:furkankamaci@gmail.com] 
Envoyé : lundi 10 mars 2014 14:26
À : solr-user@lucene.apache.org
Objet : Re: Zookeeper will not update cluster state when garbaging

Hi Metin;

I think that timeout value you are talking about is that:
http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html However it is not recommended
to change timeout value of Zookeeper "if you do not have a specific reason". On the other
hand how many Zookeepers do you have at your infrastructure?

Also regardless of your question: if it is OK for you could you add your company here: https://wiki.apache.org/solr/PublicServers
This may be nice for the people that who wonders about which companies uses Solr.

Thanks;
Furkan KAMACI


2014-03-10 12:35 GMT+02:00 OSMAN Metin <Metin.OSMAN@canal-plus.com>:

> Hi all,
>
> we are using SolrCloud with this configuration :
>
> *         SolR 4.4.0
>
> *         Zookeeper 3.4.5
>
> *         one server with zookeeper + 4 solr nodes
>
> *         one server with 4 solr nodes
>
> *         only one core
>
> *         Solr instances deployed on tomcats with mod_cluster
>
> *         clients access with SolRJ trough Apache + mod_cluster
>
> On the morning, we have massive updates (several thousands in a few
> minute) with explicit softCommit=true.
> This updates are load balanced on each regardless a node is the leader 
> or not.
>
> When this happens, the solr cloud admin console shows 7 nodes as 
> recovering and the leader as active.
> We also noticed, that refreshing the graphic is very long.
> This situation can last 3 hours until the clusterstate refreshes.
> During this phase, Zookeeper is hardly garbaging (I can post the Munin 
> gc graphs).
>
> Here are the command line parameters of zookeeper and solr nodes (I 
> have replaced some values with XXX for confidentiality reason).
>
> Zookeeper :
>
> java -cp
> /var/lib/zookeeper/bin/../build/classes:/var/lib/zookeeper/bin/../build/lib/*.jar:/var/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/var/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/var/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/var/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/var/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/var/lib/zookeeper/bin/../zookeeper-3.4.5.jar:/var/lib/zookeeper/bin/../src/java/lib/*.jar:/app/zookeeper/conf:
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=XXX -Xms384m -Xmx384m 
> -XX:MaxPermSize=128m -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> /app/zookeeper/conf/zoo.cfg
>
> SolR :
>
> /usr/lib/jvm/java/bin/java
> -Dsolr.data.dir=/app/solr/server/search_01/vod/data
> -Dsolr.solr.home=/app/solr/server/search_01 -DnumShards=1 
> -Dbootstrap_confdir=/app/solr/server/search_01/vod/conf
> -Dcollection.configName=vod -DzkHost=XXX:2181 -Dtomcat.server.port=XXX 
> -Dtomcat.http.port=XXX -Dtomcat.ajp.port=XXX 
> -Dlog4j.configuration=file:///app/tomcat/server/search_01/conf/log4j.p
> roperties
> -Djboss.jvmRoute=SEARCH_02_01 
> -Djboss.modcluster.sendToApacheDelayInSec=10
> -Djboss.modcluster.nodetimeout=30 -Djboss.modcluster.ttl=10 -Xms2048m 
> -Xmx2048m -XX:MaxPermSize=384m -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.port=XXX
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false -classpath 
> :/app/tomcat/server/search_01/bin/bootstrap.jar:/app/tomcat/server/sea
> rch_01/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar
> -Dcatalina.base=/app/tomcat/server/search_01
> -Dcatalina.home=/app/tomcat/server/search_01 -Djava.endorsed.dirs= 
> -Djava.io.tmpdir=/app/tomcat/server/search_01/temp
> -Djava.util.logging.config.file=/app/tomcat/server/search_01/conf/log4
> j.properties 
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> org.apache.catalina.startup.Bootstrap start
>
> I have tried other gc strategies, max heap values, new ratio, etc... 
> on Zookeeper without success.
> Every time zookeeper is garbaging, the clusterstate is not correct.
>
> Is this a bug with zookeeper, SolR 4.4.0 or is it due to some 
> misconfiguration ?
> I have seen somewhere that there is a timeout value between solr and 
> zookeeper, but I don't know where it is set (and what is its default value).
>
> Any help will be appreciated.
>
> Regards,
> Metin
>

Mime
View raw message