lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Binda <>
Subject Re: Finding a match for an automaton against a FST
Date Sat, 10 Jan 2015 13:23:04 GMT
On 01/10/2015 11:00 AM, Michael McCandless wrote:
> On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda <> wrote:
>> Hello.
>> 1) What is the best way to check if an automaton (from a regex or a string
>> with a wildcard)
>> has at least 1 match against a FST (from a WFSTCompletionLookup) ?
> You need to implement "intersect".  We already have this method for
> two automata (; maybe you can start from that but
> cutover to the FST APIs instead for the 2nd automaton?

I looked a bit into this. This is complicated stuff :/

I think I get what the nested loops in intersect() do  : transitions 
consist of a double dimension array, somehow those arays are intersected.
I don't understand yet why is there a .min and a .max for a transition 
(why not just a codepoint ?)

Fst and automaton (and maybee the lucene codec stuff) are 3 different 
of finite state machine/transducers, right ?

How does regexQuery (automaton) match against an index ?
Does it use intersect() internally ?  (if it does, maybee I could reuse 
that code too)

>> 2) Also, is there a simple/efficient way to find the lowest and the highest
>> arcs of a FST that match against  an automaton ?
> Hmm arcs leaving which state?  The initial state?  You could simply
> walk all arcs leaving the initial state from the FST and check if the
> automaton accepts them leaving its initial state (assuming the
> automaton has no dead states)?
> Or, if you are already doing an intersection here, just save this
> information as a side effect since you will have already computed it.

thanks for the tips, it helps.

> Mike McCandless
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message