lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: fuzzy queries
Date Sat, 09 Feb 2013 12:20:38 GMT
Can you reduce your test case to indexing one document/field and
running a single FuzzyQuery (you seem to be running two at once,
OR'ing the results)?

And show the complete standalone source code (eg what is topk?) so we
can see how you are indexing / building the Query / searching.

The default minSim is 0.5.

Note that 0.01 is not useful in practice: it (should) match nearly all
terms.  But I agree it's odd one term is not matching.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Feb 9, 2013 at 5:20 AM, Pierre Antoine DuBoDeNa
<padbdn@gmail.com> wrote:
>>
>> Hello,
>>
>> I use lucene 3.6 and i try to use fuzzy queries so that I can match much
>> more results.
>>
>> I am adding for example these strings:
>>
>>  list.add("string matching");
>>
>> list.add("string123 matching");
>>
>> list.add("string matching123");
>>
>> list.add("string123 matching123");
>>
>> list.add("str4ing match2ing");
>>
>> list.add("1string 2matching");
>>
>> list.add("str_ing ma_tching");
>>
>> list.add("string_matching");
>>
>> list.add("strang mutching");
>>
>> list.add("strrring maatchinng");
>>
>> list.add("strfffing_ m atcbbhing");
>>
>> list.add("str2ing__mat3ching");
>>
>> list.add("string_m atching");
>>
>> list.add("string matching another token");
>>
>> list.add("strasding matc4hing ano23ther tok3en");
>>
>> list.add("str4ing maaatching_another 2t oken");
>>
>>
>>
>> then i do a query:
>>
>>
>> "string~0.01 matching~0.01"
>>
>>
>> and I get back these results:
>>
>>
>> Found 15 hits.
>>
>> 1. 1string 2matching
>>
>> 2. str_ing ma_tching
>>
>> 3. string_m atching
>>
>> 4. strang mutching
>>
>> 5. str4ing match2ing
>>
>> 6. strrring maatchinng
>>
>> 7. string matching
>>
>> 8. strasding matc4hing ano23ther tok3en
>>
>> 9. string matching another token
>>
>> 10. string matching123
>>
>> 11. string123 matching
>>
>> 12. strfffing_ m atcbbhing
>>
>> 13. string123 matching123
>>
>> 14. str4ing maaatching_another 2t oken
>>
>> 15. string_matching
>>
>> So only 1 result is missing (with threshold 0.01).. str2ing__mat3ching any
>> idea why? how can i extend the query to catch this one as well?
>>
>> Also what's the default threshold for the ~ operator? Without specifying
>> threshold I get 14 results string_matching and str2ing__mat3ching missing
>> this time.
>>
>> Here is the code for the queries
>>
>>
>>  Query q = new QueryParser(Version.LUCENE_35, "Name", analyzer
>> ).parse(query);
>>
>>
>>
>>  IndexSearcher searcher = new IndexSearcher(w);
>>
>>  TopScoreDocCollector collector = TopScoreDocCollector.create(topk, true);
>>
>>  searcher.search(q, collector);
>>
>>  ScoreDoc[] hits = collector.topDocs().scoreDocs;
>>
>>
>> Thanks for the help.
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message