lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From OSMAN Metin <Metin.OS...@canal-plus.com>
Subject Questions about commits and OOE
Date Wed, 04 Dec 2013 14:36:48 GMT
Hi all,

let me first explain our situation :

We have


-       two virtual servers with each :

4x SolR 4.4.0 on Tomcat 6 (+ with mod_cluster 1.2.0), each JVM has -Xms2048m -Xmx2048m -XX:MaxPermSize=384m
1x Zookeeper 3.4.5 (Only one of the two Zookeeper is active.)
CentOS 6.4
Sun JDK 1.6.0-31
16 GB of RAM
4 vCPU


-       only one core and one shard

-       ~250000 docs and 50-100 MB of index size

-       two load balancers (apache + mod_cluster) who are both connected to the 8 SolR nodes

-       1 VIP pointing to these two LB

The commit configuration is

-       every update request do a soft commit (i.e. param softCommit=true in the http request)

-       autosoftcommit disabled

-       autocommit enabled every 15 seconds

The client application is a java app with SolRj client using the previous VIP as an endpoint.
We need NearRealTime modifications visible by the end users.
During the day, the client uses SolR with about 80% of select requests and 20% of update requests.
Every morning, the client is sending a massive bunch of updates (about 10000 in a few minutes).

During this massive update, we have sometimes a peak of active threads exceeding the limit
of 8192 process authorized for the user running the tomcat and zookeeper process.
When this happens, every hardCommit is failing with an "OutOfMemory : unable to create native
thread" message.


Now, I have some questions :

-       Why are there some many threads created ? Is the softCommit on every update that opens
a new thread ?

-       Once an OOE occurs, every hardcommit will be broken, even if the number of threads
opened on the system is low. Is there any way to "free" the JVM ? The only solution we have
found is to restart all the JVM.

-       When the OOE occurs, the SolR cloud console shows the leader node as active and the
others as recovering

o   is the replication working at that moment ?

o   as all the hardcommits are failing but the softcommits not, am I very sure that I will
not lose some updates when restarting all the nodes ?

By the way, we are planning to

-       disable the softCommit parameter on the client side and to enable the autosoftcommit
instead.

-       create another server and make 3 zookeeper chorum instead of a unique zookeeper master.

-       skip the use of load balancers and let zookeeper decide which node will respond to
the requests

Any help would be appreciated !

Metin OSMAN

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message