lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: SolrCloud unstable
Date Sun, 24 Nov 2013 21:50:30 GMT
Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer 
supported by Oracle. Also, read up on the various garbage collectors. It 
is a complex topic and there are many guides online.

In particular there is a problem in some Java 6 releases that causes a 
massive memory leak in Solr. The symptom is that memory use oscillates 
(normally) from, say 1GB to 2GB. After the bug triggers, the ceiling of 
2GB becomes the floor, and memory use oscillates from 2GB to 3GB. I'm 
not saying this is the problem you have. I'm just saying that is 
important to read up on garbage collection.

Lance

On 11/22/2013 05:27 AM, Martin de Vries wrote:
>   
>
> We did some more monitoring and have some new information:
>
> Before
> the issue happens the garbage collector's "collection count" increases a
> lot. The increase seems to start about an hour before the real problem
> occurs:
>
> http://www.analyticsforapplications.com/GC.png [1]
>
> We tried
> both the g1 garbage collector and the regular one, the problem happens
> with both of them.
>
> We use Java 1.6 on some servers. Will Java 1.7 be
> better?
>
> Martin
>
> Martin de Vries schreef op 12.11.2013 10:45:
>
> Hi,
>> We have:
>>
>> Solr 4.5.1 - 5 servers
>> 36 cores, 2 shards each,
> 2 servers per shard (every core is on 4
>> servers)
>> about 4.5 GB total
> data on disk per server
>> 4GB JVM-Memory per server, 3GB average in
> use
>> Zookeeper 3.3.5 - 3 servers (one shared with Solr)
>> haproxy load
> balancing
>> Our Solrcloud is very unstable. About one time a week
> some cores go in
>> recovery state or down state. Many timeouts occur
> and we have to restart
>> servers to get them back to work. The failover
> doesn't work in many
>> cases, because one server has the core in down
> state, the other in
>> recovering state. Other cores work fine. When the
> cloud is stable I
>> sometimes see log messages like:
>> - shard update
> error StdNode:
> http://033.downnotifier.com:8983/solr/dntest_shard2_replica1/:org.apache.solr.client.solrj.SolrServerException:
>
>> IOException occured when talking to server at:
>>
> http://033.downnotifier.com:8983/solr/dntest_shard2_replica1
>> -
> forwarding update to
> http://033.downnotifier.com:8983/solr/dn_shard2_replica2/ failed -
> retrying ...
>> - null:ClientAbortException: java.io.IOException: Broken
> pipe
>> Before the the cloud problems start there are many large
> Qtime's in the
>> log (sometimes over 50 seconds), but there are no
> other errors until the
>> recovery problems start.
>>
>> Any clue about
> what can be wrong?
>> Kinds regards,
>>
>> Martin
>   
>
> Links:
> ------
> [1]
> http://www.analyticsforapplications.com/GC.png
>


Mime
View raw message