lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Questions about FuzzyQuery in Lucene 4.x
Date Thu, 31 Jan 2013 20:23:24 GMT
On Thu, Jan 31, 2013 at 2:52 PM, George Kelvin
<george.kelvin738@gmail.com> wrote:

> Thank you! That is the problem! I changed the maxExpansions to 100 and the
> results are found.

Phew!

> About my second question, the ranking of wildcard fuzzy search, can you
> also give some suggestions? Thanks!

This is tricky, eg see how FuzzyQuery sets the per-term score in
FuzzyTermsEnum using BoostAttribute.

Rob's answer (on that stackoverflow question) is a nice solution,
since it make a single Automaton matching the right (fuzzy + prefix)
terms ... maybe you could make your own TermsEnum that somehow finds
the "fuzzy" part of the matching term and then computes the edit
distance and sets the BoostAttribute accordingly?

Or alternatively maybe you could enumerate the top N matching terms
and their edit distances, and create your own BooleanQuery with those
terms fed to PrefixQuery, with the boost set?

Really, ideally, our Automaton would be able to carry weights, and the
LevN machine would set weights according to the edit distance.  Then
AutomatonQuery could boost terms according to the matched weight and
Rob's solution would "just work"... or maybe we convert the Automaton
to an FST (which can represent weights on its outputs) and make an
FSTQuery.

But in the meantime I think making your own BQ (which is what FuzzyQ
does anyway under the hood) is the best workaround ...

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message