lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Solr-8.1.0 uses much more memory
Date Mon, 27 May 2019 15:26:44 GMT
Solr really should use a limited pool for handling external requests. We’ve driven it into
OOM a few times with too much traffic, just creating a useless number of threads.

But that requires separate pools for external requests and cluster-internal requests, which
would probably require separate ports for external and internal.

We’ve considered running a local copy of nginx on each server, exposing that different port
as the external port, and using nginx to limit traffic. But Solr really should not create
thousands of internal threads then fall over. That is just dumb.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 27, 2019, at 3:05 AM, Joe Doupnik <jrd@netlab1.net> wrote:
> 
>     You are certainly correct about using external load balancers when appropriate. However,
a basic problem with servers, that of accepting more incoming items than can be handled gracefully
is as we know an age-old one and solved by back pressure methods (particularly hard limits).
My experience with Solr suggests that parts (say Tika) are being too nice to incoming material,
letting too many items enter the application, consume resources, and so forth which then become
awkward to handle (see the locks item discussion cited earlier). Entry ought to be blocked
until the processing structure declares that resources are available to accept new entries
(a full but not overfull pipeline). Those internal issues, locks, memory and similar, are
resolvable when limits are imposed. Also, with limits then your mentioned load balancers stand
a chance of sensing when a particular server is currently not accepting new requests. Establishing
limits does take some creative thinking about how the system as a whole is constructed.
>     I brought up the overload case because it pertains to this main memory management
thread.
>     Thanks,
>     Joe D.
> 
> On 27/05/2019 10:21, Bernd Fehling wrote:
>> I think it is not fair blaiming Solr not also having a load balancer.
>> It is up to you and your needs to set up the required infrastucture
>> including load balancing. The are many products available on the market.
>> If your current system can't handle all requests then install more replicas.
>> 
>> Regards
>> Bernd
>> 
>> Am 27.05.19 um 10:33 schrieb Joe Doupnik:
>>>      While on the topic of resource consumption and locks etc, there is one other
aspect to which Solr has been vulnerable. It is failing to fend off too many requests at one
time. The standard approach is, of course, named back pressure, such as not replying to a
query until resources permit and thus keeping competion outside of the application. That limits
resource consumption, including locks, memory and sundry, while permiting normal work within
to progress smoothly. Let the crowds coming to a hit show queue in the rain outside the theatre
until empty seats become available.
>>> 
>>> On 27/05/2019 08:52, Joe Doupnik wrote:
>>>> Generalizations tend to fail when confronted with conflicting evidence. The
simple  evidence is asking how much real memory the Solr owned process has been allocated
(top, or ps aux or similar) and that yields two very different values (the ~1.6GB of Solr
v8.0 and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to name its usage (heap
or otherwise). Prior to v8.1 Solr memory consumption varied with activity, thus memory management
was occuring, memory was borrowed from and returned to the system. What might be happening
in Solr v8.1 is the new memory management code is failing to do a proper job, for reasons
which are not visible to us in the field, and that failure is important to us.
>>>>     In regard to the referenced lock discussion, it would be a good idea
to not let the tail wag the dog, tend the common cases and live with a few corner case difficulties
because perfection is not possible.
>>>>     Thanks,
>>>>     Joe D.
>>>> 
>>>> On 26/05/2019 20:30, Shawn Heisey wrote:
>>>>> On 5/26/2019 12:52 PM, Joe Doupnik wrote:
>>>>>>      I do queries while indexing, have done so for a long time, without
difficulty nor memory usage spikes from dual use. The system has been designed to support
that.
>>>>>>      Again, one may look at the numbers using "top" or similar. Try
Solr v8.0 and 8.1 to see the difference which I experience here. For reference, the only memory
adjustables set in my configuration is in the Solr startup script solr.in.sh saying add "-Xss1024k"
in the SOLR_OPTS list and setting SOLR_HEAP="4024m".
>>>>> 
>>>>> There is one significant difference between 8.0 and 8.1 in the realm
of memory management -- we have switched from the CMS garbage collector to the G1 collector.
 So the way that Java manages the heap has changed. This was done because the CMS collector
is slated for removal from Java.
>>>>> 
>>>>> https://issues.apache.org/jira/browse/SOLR-13394
>>>>> 
>>>>> Java is unlike other programs in one respect -- once it allocates heap
from the OS, it never gives it back.  This behavior has given Java an undeserved reputation
as a memory hog ... but in fact Java's overall memory usage can be very easily limited ...
an option that many other programs do NOT have.
>>>>> 
>>>>> In your configuration, you set the max heap to a little less than 4GB.
You have to expect that it *WILL* use that memory.  By using the SOLR_HEAP variable, you have
instructed Solr's startup script to use the same setting for the minimum heap as well as the
maximum heap. This is the design intent.
>>>>> 
>>>>> If you want to know how much heap is being used, you can't ask the operating
system, which means tools like top.  You have to ask Java. And you will have to look at a
long-term graph, finding the low points. An instananeous look at Java's heap usage could show
you that the whole heap is allocated ... but a significant part of that allocation could be
garbage, which becomes available once the garbage is collected.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>>> 
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message