lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Memory leak in Solr
Date Sun, 04 Dec 2016 23:46:55 GMT
On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).
<snip>
> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 10) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 11) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 12) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast

These errors sound like timeouts, possibly caused by long GC pauses ...
but as already mentioned, the query handler statistics do not indicate
long query times.  If a long GC were to happen during a query, then the
query time would be long as well.

The core information above doesn't include the size of the index on
disk.  That number would be useful for telling you whether there's
enough memory.

As I said at the beginning of the thread, I haven't seen anything here
to indicate a memory leak, and others are using version 4.10 without any
problems.  If there were a memory leak in a released version of Solr,
many people would have run into problems with it.

Thanks,
Shawn


Mime
View raw message