lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: LevenshteinAutomata challenge
Date Wed, 10 Aug 2011 13:09:10 GMT
On Wed, Aug 10, 2011 at 7:32 AM, eks dev <> wrote:
> Thanks David,
> I did not know I can mix Automaton with LevenshteinAutomaton.
> What you say is Automaton.concatenate(LevenshteinAutomaton),
> intersect, union would work.

You can, by doing this:

LevenshteinAutomata builder = new LevenshteinAutomata("foobar");
Automaton a1 = builder.toAutomaton(1); // n=1
Automaton a2 = builder.toAutomaton(2); // n=2

Other notes:

we actually use these operations (e.g. concatenate) internally,
because FuzzyQuery historically supported a "prefixLen".
so if you do foobar with edit distance=1 and prefixLen of 3,
FuzzyTermsEnum builds a "prefix automaton" of "foo" and concatenates
it with a n=1 automaton of "bar"

        Automaton a = builder.toAutomaton(i);
        // constant prefix
        if (realPrefixLength > 0) {
          Automaton prefix = BasicAutomata.makeString(
            UnicodeUtil.newString(termText, 0, realPrefixLength));
          a = BasicOperations.concatenate(prefix, a);

For the regexp syntax you discuss, you can actually already do this.
This is one reason why RegexpQuery has a constructor that takes
  public RegexpQuery(Term term, int flags, AutomatonProvider provider) {
    super(term, new RegExp(term.text(), flags).toAutomaton(provider));

So you can provide a subclass of AutomatonProvider that implements
custom syntax of your own as long as its surrounded in brackets < >,
e.g. <LEV1:foobar>
AutomatonProvider is a simple interface that answers to named
automata: public Automaton getAutomaton(String name) throws
If you do this, make sure you enable named automata (RegExp.AUTOMATON
or of course RegExp.ALL) in the flags!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message