lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Building FST-like automaton queries
Date Tue, 28 Feb 2012 12:33:35 GMT
Hello,

I'm trying to create a Lucene Query that will take a term and expand it to include common
OCR errors (for example, 'cl' is often misread as 'd', so a search for 'clog' should also
hit 'dog').  My plan is to do this by generating all the possible variants of a term, using
an existing list of errors, and then somehow mapping this into an AutomatonQuery.  I've been
looking around the o.a.l.util.automaton and o.a.l.util.fst packages on trunk, and I *think*
that this is possible, but I'm so far failing to work out how to put the various bits together.

I'm thinking it should work like this:
1) expand query term to sorted list of possible matches
2) create an FST over those matches
3) plug this FST into an AutomatonQuery subclass.

1) is easy.  It's 2) and 3) I'm having trouble with.  

All help gratefully received!

Thanks, 

Alan Woodward
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message