lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: FuzzyQuery on entire set of terms
Date Fri, 21 Oct 2016 21:28:05 GMT
You mean the total number of edits between those strings must be <= 2?

If so, you must index the entire "Lucene Apache Group" as a single
token, and likewise do a FuzzyQuery with the entire "Luceni Apachi
Group", etc.

If instead you do tokenize and use BooleanQuery to combine them, then
that allows <= 2 edits for each term, or more than 2 edits total.
Performance is likely fine here; FuzzyQuery is very faster since
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html
... have you tested it?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 21, 2016 at 2:45 PM, Michael Wilkowski <mw@silenteight.com> wrote:
> Hi,
> I need to implement a function that performs fuzzy search on multiple terms
> in the way that a summarized distance 2 from ALL terms is allowed. For
> example query:
>
> Lucene Apache Group
>
> with maximum distance 2 should match:
>
> Luceni Apachi Group
> Lucen Apache Group
> Luce Apache Group
>
> but not:
>
> Lucen Apach Grou
>
> I know that I can achieve it using multiple FuzzyQueries nested with
> BooleanQueries, but in case of more terms (>5) and distance of 2 there
> could be many many combinations and I am afraid of performance.
>
> Perhaps there is a better solution that someone may recommend?
>
> Regards,
> Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message