lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre Antoine DuBoDeNa <pad...@gmail.com>
Subject fuzzy queries
Date Sat, 09 Feb 2013 10:20:13 GMT
>
> Hello,
>
> I use lucene 3.6 and i try to use fuzzy queries so that I can match much
> more results.
>
> I am adding for example these strings:
>
>  list.add("string matching");
>
> list.add("string123 matching");
>
> list.add("string matching123");
>
> list.add("string123 matching123");
>
> list.add("str4ing match2ing");
>
> list.add("1string 2matching");
>
> list.add("str_ing ma_tching");
>
> list.add("string_matching");
>
> list.add("strang mutching");
>
> list.add("strrring maatchinng");
>
> list.add("strfffing_ m atcbbhing");
>
> list.add("str2ing__mat3ching");
>
> list.add("string_m atching");
>
> list.add("string matching another token");
>
> list.add("strasding matc4hing ano23ther tok3en");
>
> list.add("str4ing maaatching_another 2t oken");
>
>
>
> then i do a query:
>
>
> "string~0.01 matching~0.01"
>
>
> and I get back these results:
>
>
> Found 15 hits.
>
> 1. 1string 2matching
>
> 2. str_ing ma_tching
>
> 3. string_m atching
>
> 4. strang mutching
>
> 5. str4ing match2ing
>
> 6. strrring maatchinng
>
> 7. string matching
>
> 8. strasding matc4hing ano23ther tok3en
>
> 9. string matching another token
>
> 10. string matching123
>
> 11. string123 matching
>
> 12. strfffing_ m atcbbhing
>
> 13. string123 matching123
>
> 14. str4ing maaatching_another 2t oken
>
> 15. string_matching
>
> So only 1 result is missing (with threshold 0.01).. str2ing__mat3ching any
> idea why? how can i extend the query to catch this one as well?
>
> Also what's the default threshold for the ~ operator? Without specifying
> threshold I get 14 results string_matching and str2ing__mat3ching missing
> this time.
>
> Here is the code for the queries
>
>
>  Query q = new QueryParser(Version.LUCENE_35, "Name", analyzer
> ).parse(query);
>
>
>
>  IndexSearcher searcher = new IndexSearcher(w);
>
>  TopScoreDocCollector collector = TopScoreDocCollector.create(topk, true);
>
>  searcher.search(q, collector);
>
>  ScoreDoc[] hits = collector.topDocs().scoreDocs;
>
>
> Thanks for the help.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message