lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Term Dictionary taking up lots of memory, looking for solutions, lucene 5.3.1
Date Thu, 18 May 2017 06:35:55 GMT
Is upgrading to Lucene 6 and using points rather than terms an option?
Points typically have lower memory usage (see GeoPoint which is based on
terms vs LatLonPoint which is based on points at
http://people.apache.org/~mikemccand/geobench.html#reader-heap).

Le jeu. 18 mai 2017 à 02:35, Tom Hirschfeld <tomhirschfeld@gmail.com> a
écrit :

> Hey!
>
> I am working on a lucene based service for reverse geocoding. We have a
> large index with lots of unique terms (550 million) and it appears that
> we're running into issue with memory on our leaf servers as the term
> dictionary for the entire index is being loaded into heap space. If we
> allocate > 65g heap space, our queries return relatively quickly (10s -100s
> of ms), but if we drop below ~65g heap space on the leaf nodes, query time
> drops dramatically, quickly hitting 20+ seconds (our test harness drops at
> 20s).
>
> I did some research, and found in past versions of lucene, one could split
> the loading of the terms dictionary using the 'termInfosIndexDivisor'
> option in the directoryReader class. That option was deprecated in lucene
> 5.0.0
> <https://abi-laboratory.pro/java/tracker/changelog/lucene/5.0.0/log.html>
> in
> favor of using codecs to achieve similar functionality. Looking at the
> available experimental codecs. I see the BlockTreeTermsWriter
> <
> https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.html#BlockTreeTermsWriter(org.apache.lucene.index.SegmentWriteState
> ,
> org.apache.lucene.codecs.PostingsWriterBase, int, int)> that seems like it
> could be used for a similar purpose, breaking down the term dictionary so
> that we don't load the whole thing into heap space.
>
> Has anyone run into this problem before and found an effective solution?
> Does changing the codec used seem appropriate for this issue? If so, how do
> I got about loading an alternative codec and configuring it to my needs?
> I'm having trouble finding docs/examples of how this is used in the real
> world so even if you point me to a repo or docs somewhere I'd appreciate
> it.
> Thanks!
>
> Best,
> Tom Hirschfeld
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message