lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <>
Subject Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1
Date Thu, 18 May 2017 06:35:55 GMT
Is upgrading to Lucene 6 and using points rather than terms an option?
Points typically have lower memory usage (see GeoPoint which is based on
terms vs LatLonPoint which is based on points at

Le jeu. 18 mai 2017 à 02:35, Tom Hirschfeld <> a
écrit :

> Hey!
> I am working on a lucene based service for reverse geocoding. We have a
> large index with lots of unique terms (550 million) and it appears that
> we're running into issue with memory on our leaf servers as the term
> dictionary for the entire index is being loaded into heap space. If we
> allocate > 65g heap space, our queries return relatively quickly (10s -100s
> of ms), but if we drop below ~65g heap space on the leaf nodes, query time
> drops dramatically, quickly hitting 20+ seconds (our test harness drops at
> 20s).
> I did some research, and found in past versions of lucene, one could split
> the loading of the terms dictionary using the 'termInfosIndexDivisor'
> option in the directoryReader class. That option was deprecated in lucene
> 5.0.0
> <>
> in
> favor of using codecs to achieve similar functionality. Looking at the
> available experimental codecs. I see the BlockTreeTermsWriter
> <
> ,
> org.apache.lucene.codecs.PostingsWriterBase, int, int)> that seems like it
> could be used for a similar purpose, breaking down the term dictionary so
> that we don't load the whole thing into heap space.
> Has anyone run into this problem before and found an effective solution?
> Does changing the codec used seem appropriate for this issue? If so, how do
> I got about loading an alternative codec and configuring it to my needs?
> I'm having trouble finding docs/examples of how this is used in the real
> world so even if you point me to a repo or docs somewhere I'd appreciate
> it.
> Thanks!
> Best,
> Tom Hirschfeld

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message