lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <>
Subject LevenshteinAutomata challenge
Date Wed, 10 Aug 2011 08:54:37 GMT

Hi Robert, Mike & other FS(A|T) gurus, 

a challenge for you ;)

Would it be possible to combine these brilliant peaces of functionality with 
normal Automaton somehow...

Example to illustrate.  
- where instead of minPrefix, we would specify Regex (other Automaton)
pfxAutiomaton = Regex("(AB)|(BA)") // e.g. Saying, 
levAutomaton = LevenshteinAutomata("XYZ")  

spell(pfxAutomaton, levAutomaton);

would match terms that start with "AB" or "BA" and suffix part are normal edit 
distance matches, like ABXY, with one delete
This  would support wild things, like "enable only transpositions in first  
three characters"... In order to gat these matches today, you need to  make Lev. 
Automata with maxDistance = 2 (which is  then HUGE space to search without 
prefix)... Or generate more Lev.  automata and make union of results (expensive 
to itterate)

Other good use cases are simple to construct... 

The  most general question, can we support at least concatenation between  
LevenshteinAutomata  and normal Automata. Intersection/union would be  crazy 
thing as well? Where we would have:
FilteringAutomata.intersect(LevenshteinAutomata)...  but I guess I am dreaming 
with this one, but concatenation sounds   doable (at least prefix side) 

View raw message