lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Building FST-like automaton queries
Date Tue, 28 Feb 2012 14:35:37 GMT
On Tue, Feb 28, 2012 at 8:42 AM, Alan Woodward
<alan.woodward@romseysoftware.co.uk> wrote:
>
> On 28 Feb 2012, at 13:31, Michael McCandless wrote:
>
>> Neat :)  It's like a FuzzyQuery w/ a custom (binary?) cost matrix for
>> the insert/delete/transposition changes...
>>
>> Is the number of edits smallish?  Ie you're not concerned about
>> combinatoric explosion of step 1?
>
> We're only allowing expansions within an edit distance of 1, which should keep the numbers
of terms down.

Ahh, ok.  So even if the term has two occurrences of cl, only one of
them is allowed to substitute d?

>> For steps 2 and 3 you shouldn't use FST at all.  Instead, for 2) use
>> BasicAutomata.makeString(String) on each of your expanded terms, then
>> BasicOperations.union on all of those automata to make a single
>> automaton accepting all your expanded terms, then likely call
>> .determinize() on the resulting automaton (maybe also .minimize() but
>> I think that may not help).  Then pass that automaton to AQ.
>
> Excellent, thanks for your help.  I'll give that a go.

You might also try the DaciukMihovAutomatonBuilder class (it's in
lucene/test-framework): it builds a minimal deterministic automaton
from sorted terms, very quickly... you'd just have to pre-sort your
terms.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message