lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anna Björk Nikulásdóttir <>
Subject Re: Avoid automaton Memory Usage
Date Thu, 08 Aug 2013 16:54:56 GMT

Am 8.8.2013 um 12:37 schrieb Michael McCandless <>:

> <snip>
>> What would help in my case as I use the same FST for both analyzers, if the same
FST object could be shared among both analyzers. So what I am doing is to use
and use the stored file for AnalyzingSuggester.load() and FuzzySuggester.load().
> That's interesting ... so you mean you sometimes want fuzzy
> suggestions and sometimes non-fuzzy ones, off the same built
> suggester?  I believe AnalyzingSuggester and FuzzySuggester in fact
> use the same FST (not certain) ... are you able to do
> FuzzySuggester.load from a previous and it
> works?  And that's still too much RAM?

Yes it works like a charm. I use it for auto completion of non english language terms. Often
the typed beginning of a term can be used as is and then AnlyzingSuggester gives best results,
whereas FuzzySuggester would give too many results that need a lot of post processing. If
the user is lazy and because the Android keyboard doesn't always provide easy access to specific
letters, e.g. 'æ', 'ä', 'ß', etc. or if he mistypes some letters, I use FuzzySuggester
as fallback if AnalyzingSuggester doesn't yield appropriate results. It's a bit of a cludge
because FuzzySuggester doesn't boost minimal Levenstein-Distance terms.

Performance wise this is absolutely no problem on Android, but memory wise it means 2x the
FST memory. Atm. 1 FST needs ~20MB. If e.g. I would like to simultanously support multiple
languages, it's not going to work this way.

Ideally all this could be done on disk/flash only. But this then needs changes according to
your former proposal via DirectByteBuffer. Do you think going this way would yield acceptable
performance ? And does mapping a file into memory not fill the DRAM with the complete content
of the file over time ? Are "normal" Lucene indexes accessed this way ?

>> Unfortunately there is no immutable FST class, but as I do not use it in mulithreaded
environment, that is probably not a problem, no ? A quick fix could be to copy the analyzer
classes and change these to such behaviour and reuse the FST object. Does this make sense
functional wise or do I have to expect problems ?
> Sharing an FST across analyzing and fuzzy suggesters does seem
> worthwhile; it may "just work" today…

I will try then. Do you have any evidence about if it could not work at some point in the
future ?

>> Would a patch for such behaviour make sense for the existing analyzer classes or
is this use case too specific ?
> It might ... open an issue and we can discuss/iterate there?

If it works here, I will open an issue / provide a patch.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message