lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <>
Subject Building FST-like automaton queries
Date Tue, 28 Feb 2012 12:33:35 GMT

I'm trying to create a Lucene Query that will take a term and expand it to include common
OCR errors (for example, 'cl' is often misread as 'd', so a search for 'clog' should also
hit 'dog').  My plan is to do this by generating all the possible variants of a term, using
an existing list of errors, and then somehow mapping this into an AutomatonQuery.  I've been
looking around the o.a.l.util.automaton and o.a.l.util.fst packages on trunk, and I *think*
that this is possible, but I'm so far failing to work out how to put the various bits together.

I'm thinking it should work like this:
1) expand query term to sorted list of possible matches
2) create an FST over those matches
3) plug this FST into an AutomatonQuery subclass.

1) is easy.  It's 2) and 3) I'm having trouble with.  

All help gratefully received!


Alan Woodward
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message