lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] [Updated] (LUCENE-7211) Improve geospatial garbage generation
Date Wed, 13 Apr 2016 04:48:25 GMT


David Smiley updated LUCENE-7211:
       Assignee: David Smiley
    Component/s: modules/spatial

Thanks Jeff, particularly for sharing your benchmark results.

> Improve geospatial garbage generation
> -------------------------------------
>                 Key: LUCENE-7211
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: Jeff Wartes
>            Assignee: David Smiley
>              Labels: spatialrecursiveprefixtreefieldtype
>         Attachments: SOLR-8944-Use-DocIdSetBuilder-instead-of-FixedBitSet.patch
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. (5.4,
86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal order of magnitude
(by size) is the long[] allocated by FixedBitSet. From the backtraces, it appears the biggest
source of FixBitSet creation in my case (by two orders of magnitude) is my use of queries
that involve geospatial filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, which presumably
changes less frequently than queries are issued. If an existing FixedBitSet were not available
from a pool, the worst case (create a new one) would be no worse than the current behavior.
The complication would be enforcement around when to return the object to the pool, but it
looks like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts considerable
effort into allocating smaller chunks only as necessary. Is this not usable for this purpose?
How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little more data around
the current choices before choosing an approach.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message