lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Rosenwald <thestri...@gmail.com>
Subject OOM when using Lucene 5.X's group facet collectors on unsharded index
Date Mon, 06 Jul 2015 23:56:21 GMT
Hello all,

     When using Lucene 5.X's group facet collectors (i.e. 
*AbstractGroupFacetCollector* and the provided concrete implementation, 
*TermGroupFacetCollector*), I repeatedly encounter OOM errors after 
running a few search requests on an unsharded index consisting of a few 
million documents. I had experienced the issue in Lucene 5.0.0 and still 
see it when using 5.2.1.

     I've initialized three such collectors to accumulate values over 
three different facet fields (all SortedNumericDV fields).  The 
collectors all look like the following:

==BEGIN CODE BLOCK==

    AbstractGroupFacetCollector thisFacetCollector =
    TermGroupFacetCollector.createTermGroupFacetCollector(groupField,
                         thisFacetField, facetFieldMultivalued,
    facetPrefix, initialSize);

==END CODE BLOCK==

     Note that facetFieldMultivalued = false, facetPrefix = null, and 
initialSize = 128.  There are a few million unique groups indexed in the 
group field.  The heap blows up regardless of the number of unique 
entries in the facet field (one of the facet fields has, e.g., fewer 
than 100 unique values).

     I have confirmed that the heap ballooning /only/ occurs during 
collection time (i.e. if I comment out the three TermGroupFacetCollector 
assignments, I have no OOM issues; even if only one of them is enabled, 
the heap will eventually face OOM).

     Some additional system-related bits.  I'm running Lucene 5.2.1 on a 
dev environment w/ ~8GB heap space w/ 16GB total RAM.  I am not using 
any special codecs.  I've confirmed that the indexes (incl. the sidecar 
facet indexes) get opened only once during initialization of the 
service.  Both the index and sidecar facet index directories are opened 
as NIOFSDirectory objects.  I have also tried MMapDirectory and 
experience the same problem.

     After profiling the heap extensively and after reading the Lucene 
group faceting source code, I suspect  that the DVs (for both the group 
and facet fields) and/or  the arrays used to accumulate facet counts 
remain memory resident.  After executing the same set of queries 
multiple times, I see heap usage balloon by 1-2GB at a time.  I've tried 
segmenting the index, but while that reduces heap usage for ad-hoc 
searches, it does not get rid of the OOM issue.

     Any help here would be greatly appreciated.  Many thanks in advance.

--A.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message