lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: LevenshteinAutomata challenge
Date Wed, 10 Aug 2011 13:09:10 GMT
On Wed, Aug 10, 2011 at 7:32 AM, eks dev <eksdev@yahoo.co.uk> wrote:
> Thanks David,
>
> I did not know I can mix Automaton with LevenshteinAutomaton.
>
> What you say is Automaton.concatenate(LevenshteinAutomaton),
> intersect, union would work.
>

You can, by doing this:

LevenshteinAutomata builder = new LevenshteinAutomata("foobar");
Automaton a1 = builder.toAutomaton(1); // n=1
Automaton a2 = builder.toAutomaton(2); // n=2

Other notes:

we actually use these operations (e.g. concatenate) internally,
because FuzzyQuery historically supported a "prefixLen".
so if you do foobar with edit distance=1 and prefixLen of 3,
FuzzyTermsEnum builds a "prefix automaton" of "foo" and concatenates
it with a n=1 automaton of "bar"

        Automaton a = builder.toAutomaton(i);
        // constant prefix
        if (realPrefixLength > 0) {
          Automaton prefix = BasicAutomata.makeString(
            UnicodeUtil.newString(termText, 0, realPrefixLength));
          a = BasicOperations.concatenate(prefix, a);
        }

For the regexp syntax you discuss, you can actually already do this.
This is one reason why RegexpQuery has a constructor that takes
AutomatonProvider:
  public RegexpQuery(Term term, int flags, AutomatonProvider provider) {
    super(term, new RegExp(term.text(), flags).toAutomaton(provider));
  }

So you can provide a subclass of AutomatonProvider that implements
custom syntax of your own as long as its surrounded in brackets < >,
e.g. <LEV1:foobar>
AutomatonProvider is a simple interface that answers to named
automata: public Automaton getAutomaton(String name) throws
IOException;
If you do this, make sure you enable named automata (RegExp.AUTOMATON
or of course RegExp.ALL) in the flags!

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message