lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)
Date Mon, 20 Feb 2017 08:21:20 GMT
> PatriciaTrie. In particular building an FST with doShareSuffix = false is
> the fastest of any option,

If you don't share the suffix then you are building a kind of Patricia
trie... But suffix sharing is cheap and can give you a memory saving
(and resulting cache locality sometimes) that is non-trivial to
neglect.

> and morfologik's fsa has the lowest heap usage of
> all (both in terms of garbage and the final size of the FSA).

I think these would be minuscule differences, really. Something not
worth the effort of maintaining a compatibility with two libraries,
for example. Lucene's FST are transducers, so they do have an
additional benefit in that you can store something extra with each
entry (this may be handy sometimes -- for example frequency
information attached to each term, etc.).

> to work with BytesRef) and it supports both prefix
> (MatchResult.SEQUENCE_IS_A_PREFIX) and exact matching
> (MatchResult.EXACT_MATCH).

So does Lucene's FST, although at a lower level (you'd need to descend
manually).

> Given the efficiencies and functionality similarity of the FST vs Automation
> in Lucene, I'm kind of curious as to why you would ever use Automaton?

The Automaton class was ported from Brics  library and it supports a
much more generic class of automata, including optimizations from
NFA->DFA, etc. The FST class is built directly from one specific kind
of data (unique sorted keys on input) and it's heavily optimized
towards this case.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message