lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angel Tchorbadjiiski <angel.tchorbadjii...@antibodies-online.com>
Subject Re: SOLR OutOfMemoryError Java heap space
Date Thu, 06 Mar 2014 08:38:36 GMT
Hi Shawn,

a big thanks for the long and detailed answer. I am aware of how linux 
uses free RAM for caching and the the problems related to jvm and GC. It 
is nice to hear how this correlates to Solr. I'll take some time and 
think over it. The facet.method=enum and probably a combination of 
DocValue-Fields could be the solution needed in this case.

Thanks again to both of you and Toke for the feedback!

Cheers
Angel

On 05.03.2014 17:06, Shawn Heisey wrote:
> On 3/5/2014 4:40 AM, Angel Tchorbadjiiski wrote:
>> Hi Shawn,
>>
>> On 05.03.2014 10:05, Angel Tchorbadjiiski wrote:
>>> Hi Shawn,
>>>
>>>> It may be your facets that are killing you here.  As Toke mentioned, you
>>>> have not indicated what your max heap is.20 separate facet fields with
>>>> millions of documents will use a lot of fieldcache memory if you use the
>>>> standard facet.method, fc.
>>>>
>>>> Try adding facet.method=enum to all your facet queries, or you can put
>>>> it in the defaults section of each request handler definition.
>>> Ok, that is easy to try out.
>>>
>> Changing the facet.method does not help really as the performance of the
>> queries is really bad. This lies mostly on the small cache values, but
>> even trying to tune them for the "enum" case didn't help much.
>>
>> The number of documents and unique facet values seems to be too high.
>> Trying to cache them even with a size of 512 results in many misses and
>> Solr tries to repopulate the cache all the time. This makes the
>> performances even worse.
>
> Good performance with Solr requires a fair amount of memory.  You have
> two choices when it comes to where that memory gets used - inside Solr
> in the form of caches, or free memory, available to the operating system
> for caching purposes.
>
> Solr caches are really amazing things.  Data gathered for one query can
> significantly speed up another query, because part (or all) of that
> query can be simply skipped, the results read right out of the cache.
>
> There are two potential problems with relying exclusively on Solr
> caches, though.  One is that they require Java heap memory, which
> requires garbage collection.  A large heap causes GC issues, some of
> which can be alleviated by GC tuning.  The other problem is that you
> must actually do a query in order to get the data into the cache.  WHen
> you do a commit and open a new searcher, that cache data does away, so
> you have to do the query over again.
>
> The primary reason for slow uncached queries is disk access.  Reading
> index data off the disk is a glacial process, comparatively speaking.
> This is where OS disk caching becomes a benefit.  Most queries, even
> complex ones, become lightning fast if all of the relevant index data is
> already in RAM and no disk access is required.  When queries are fast to
> begin with, you can reduce the cache sizes in Solr, reducing the heap
> requirements.  With a smaller heap, more memory is available for the OS
> disk cache.
>
> The facet.method=enum parameter shifts the RAM requirement from Solr to
> the OS.  It does not really reduce the amount of required system memory.
>   Because disk caching is a kernel level feature and does not utilize
> garbage collection, it is far more efficient than Solr ever could be at
> caching *raw* data.  Solr's caches are designed for *processed* data.
>
> What this all boils down to is that I suspect you'll simply need more
> memory on the machine.  With facets on so many fields, your queries are
> probably touching nearly the entire index, so you'll want to put the
> entire index into RAM.
>
> Therefore, after Solr allocates its heap and any other programs on the
> system allocate their required memory, you must have enough left memory
> over to fit all (or most) of your 50GB index data.  Combine this with
> facet.method=enum and everything should be good.


Mime
View raw message