lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Re: Building FST-like automaton queries
Date Tue, 28 Feb 2012 15:06:51 GMT
>> 
>> We're only allowing expansions within an edit distance of 1, which should keep the
numbers of terms down.
> 
> Ahh, ok.  So even if the term has two occurrences of cl, only one of
> them is allowed to substitute d?

Yes, exactly - "cloocl" will be expanded to "doocl" and "clood" only.  It's a pretty inaccurate
way of searching anyway, and more expansions start leading to too many false positives without
improving recall pretty quickly.

> 
>>> For steps 2 and 3 you shouldn't use FST at all.  Instead, for 2) use
>>> BasicAutomata.makeString(String) on each of your expanded terms, then
>>> BasicOperations.union on all of those automata to make a single
>>> automaton accepting all your expanded terms, then likely call
>>> .determinize() on the resulting automaton (maybe also .minimize() but
>>> I think that may not help).  Then pass that automaton to AQ.
>> 
>> Excellent, thanks for your help.  I'll give that a go.
> 
> You might also try the DaciukMihovAutomatonBuilder class (it's in
> lucene/test-framework): it builds a minimal deterministic automaton
> from sorted terms, very quickly... you'd just have to pre-sort your
> terms.

Thanks, will have a look there too.

> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message