lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Building an automaton efficiently (CompiledAutomaton vs RunAutomaton vs Automaton)
Date Sun, 05 Mar 2017 11:05:07 GMT
Wonderful, glad to hear you got something working, and thanks for bringing
closure.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Mar 4, 2017 at 1:32 AM, Oliver Mannion <o.mannion@gmail.com> wrote:

> Hi Dawid,
>
> Yes, the performance differences between Lucene's FST,  morfologik's fsa
> and the PatriciaTrie are rather small.
>
> I've managed to get something working well for my use-case. Thanks Dawid
> and Michael for all your insight!
>
> On 20 February 2017 at 19:21, Dawid Weiss <dawid.weiss@gmail.com> wrote:
>
>> > PatriciaTrie. In particular building an FST with doShareSuffix = false
>> is
>> > the fastest of any option,
>>
>> If you don't share the suffix then you are building a kind of Patricia
>> trie... But suffix sharing is cheap and can give you a memory saving
>> (and resulting cache locality sometimes) that is non-trivial to
>> neglect.
>>
>> > and morfologik's fsa has the lowest heap usage of
>> > all (both in terms of garbage and the final size of the FSA).
>>
>> I think these would be minuscule differences, really. Something not
>> worth the effort of maintaining a compatibility with two libraries,
>> for example. Lucene's FST are transducers, so they do have an
>> additional benefit in that you can store something extra with each
>> entry (this may be handy sometimes -- for example frequency
>> information attached to each term, etc.).
>>
>> > to work with BytesRef) and it supports both prefix
>> > (MatchResult.SEQUENCE_IS_A_PREFIX) and exact matching
>> > (MatchResult.EXACT_MATCH).
>>
>> So does Lucene's FST, although at a lower level (you'd need to descend
>> manually).
>>
>> > Given the efficiencies and functionality similarity of the FST vs
>> Automation
>> > in Lucene, I'm kind of curious as to why you would ever use Automaton?
>>
>> The Automaton class was ported from Brics  library and it supports a
>> much more generic class of automata, including optimizations from
>> NFA->DFA, etc. The FST class is built directly from one specific kind
>> of data (unique sorted keys on input) and it's heavily optimized
>> towards this case.
>>
>> Dawid
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message