lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Baris Kazar <baris.ka...@oracle.com>
Subject Re: FuzzyQuery- why is it ignored?
Date Sun, 16 Jun 2019 02:42:59 GMT
Hello,-
 Erick explained how to disable stemming in Solr but i am using Lucene purely.
i am also researching how to disable it in Lucene but if You have instructions how to do so already
i appreciate if You could share here.
Best regards

----- Original Message -----
From: baris.kazar@oracle.com
To: java-user@lucene.apache.org, tomoko.uchida.1111@gmail.com, erickerickson@gmail.com, atri@linux.com, baris.kazar@oracle.com, lucene@mikemccandless.com
Sent: Thursday, June 13, 2019 10:48:47 AM GMT -05:00 US/Canada Eastern
Subject: Re: FuzzyQuery- why is it ignored?

i see, i am using an older version 6.6 and we should switch to Your 8.1 
version of at least 7.X.

Tomoko i think i understood You meant MAIN NASHUA .... for the string :)

Again i really appreciate all answers.

How do we disable or enable stemming while indexing? :) another question.

Best regards


On 6/13/19 10:40 AM, Tomoko Uchida wrote:
> Sorry, I made a mistake when copypasting. Let me just correct my previous mail.
>
>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES".
> 1. Indexed this text: "MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW
> HAMPSHIRE UNITED STATES"
>
> ----
> As far as I can say, this query correctly find the indexed document
> (so I have no idea about what is wrong with fuzzy query).
> +contentDFLT:mains~2 +contentDFLT:"nashua"
> +contentDFLT:"new-hampshire" +contentDFLT:"united states"
>
> I am
> - using lucene 8.1.
> - using standard analyzer for both of indexing and searching.
> - using classic query parser for parsing.
>
>
>
> 2019年6月13日(木) 23:18 <baris.kazar@oracle.com>:
>> However, the index does not have MAINS but MAIN for the expected entry.
>>
>> Best regards
>>
>>
>>
>> On 6/13/19 10:33 AM, baris.kazar@oracle.com wrote:
>>> does it consider it as like plural word? :) :) :)
>>> That makes sense.
>>>
>>> Best regards
>>>
>>>
>>> On 6/13/19 10:31 AM, baris.kazar@oracle.com wrote:
>>>> Erick,
>>>>
>>>> Cool, could You give a simple example with my example please?
>>>>
>>>> Best regards
>>>>
>>>>
>>>>
>>>> On 6/13/19 10:12 AM, Erick Erickson wrote:
>>>>> Shot in the dark: stemming. Whenever I see a problem with something
>>>>> ending in “s” (or “er” or “ing” or….) my first suspect is that
>>>>> stemming is turned on. In that case the token in the index that’s
>>>>> actually searched on is somewhat different than you expect.
>>>>>
>>>>> The test is easy, just insure your fieldType contains no stemmers.
>>>>> PorterStemmer is particularly aggressive, but for this case to test
>>>>> I’d just remove all stemming, re-index and see if the results differ.
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>>> On Jun 13, 2019, at 7:26 AM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>> Tomoko,-
>>>>>>
>>>>>>    That is strange indeed.
>>>>>>
>>>>>> Something is wrong when i use mains but maink, mainl, mainr,mainq,
>>>>>> maint all work ok any consonant at the end except s works in this
>>>>>> case.
>>>>>>
>>>>>> Case #3 had +contentDFLT:mains~2 but not +contentDFLT:"mains~2".
>>>>>>
>>>>>> i am using fuzzy query with ~ from Query.builder and that is not
>>>>>> PhraseQuery.
>>>>>>
>>>>>> Similarly FuzzyQuery with input "mains" (it has to be lowercase
>>>>>> since it does not go through StandardAnalyzer) is also not
>>>>>> PhraseQuery.
>>>>>>
>>>>>> can there be a clearer sample case for ComplexPhraseQuery please in
>>>>>> the docs?
>>>>>>
>>>>>> did You also index "MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED
>>>>>> STATES" the expected output in this case?
>>>>>>
>>>>>> Thanks for spending time on this, i would like to thank everyone.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 6/13/19 12:13 AM, Tomoko Uchida wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>> Ok, i think only this very specific only "mains" has an issue.
>>>>>>> It looks strange to me. I did some test locally.
>>>>>>>
>>>>>>> 1. Indexed this text: "NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE
>>>>>>> UNITED STATES".
>>>>>>>
>>>>>>> 2a. This query string (just copied from your Case #3) worked
>>>>>>> correctly
>>>>>>> for me as far as I can see.
>>>>>>> +contentDFLT:mains~2 +contentDFLT:"nashua",
>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united state"
>>>>>>>
>>>>>>> 2b. However this query string got no results.
>>>>>>> +contentDFLT:"mains~2", +contentDFLT:"nashua",
>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"
>>>>>>> It is an expected behaviour because the classic query parser does not
>>>>>>> support fuzzy query inside phrase query (as far as I know).
>>>>>>>
>>>>>>> I suspect you use fuzzy query operator (~) inside phrase query
>>>>>>> ("), as
>>>>>>> the 2b case.
>>>>>>>
>>>>>>> FYI: there is a special parser for such complex phrase query.
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_complexPhrase_ComplexPhraseQueryParser.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=ZcXpaSlwS5DegX76mHTb_6DH3P7noan1eeMXc-Vh5M8&s=FoIMlcjDO2b7Gut9XRx-NIBWiBQWItsj8IlylJC7Wkc&e=
>>>>>>>
>>>>>>>
>>>>>>> Tomoko
>>>>>>>
>>>>>>> 2019年6月13日(木) 6:16 <baris.kazar@oracle.com>:
>>>>>>>> Ok, i think only this very specific only "mains" has an issue.
>>>>>>>>
>>>>>>>> all i knew about Lucene was fine :) Great...
>>>>>>>>
>>>>>>>> i have one more question:
>>>>>>>>
>>>>>>>> which one is advised to use: FuzzyQuery or the Query.parser with
>>>>>>>> search string~ appended?
>>>>>>>>
>>>>>>>> The second one will go through analyzer and make search string
>>>>>>>> lowercase.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/12/19 1:03 PM, baris.kazar@oracle.com wrote:
>>>>>>>>
>>>>>>>> Hi again,-
>>>>>>>>
>>>>>>>> this is really interesting and i hope i am missing something.
>>>>>>>> Index small cases all entries so case sensitivity is not an issue
>>>>>>>> i think.
>>>>>>>>
>>>>>>>> Case #1:
>>>>>>>>
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>>>>> phraseAnalyzer) ;
>>>>>>>>           Query q1 = null;
>>>>>>>>           try {
>>>>>>>>               q1 = parser.parse("Main");
>>>>>>>>           } catch (ParseException e) {
>>>>>>>>               e.printStackTrace();
>>>>>>>>           }
>>>>>>>>           booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>>>
>>>>>>>>
>>>>>>>> This brings with this:
>>>>>>>>
>>>>>>>> query plan:
>>>>>>>>
>>>>>>>> [+contentDFLT:main, +contentDFLT:"nashua",
>>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>>>
>>>>>>>> testQuerySearch1 Time to compute: 0 seconds (copied answer after
>>>>>>>> exec finished)
>>>>>>>>
>>>>>>>> Number of results: 12
>>>>>>>> Name: Main Dunstable Rd
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12677400
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.72631, -71.50269
>>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
>>>>>>>> UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12681980
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76416, -71.46681
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12681973
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75045, -71.4607
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12681974
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76019, -71.465
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main Dunstable Rd
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12677399
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.74641, -71.48943
>>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
>>>>>>>> UNITED STATES
>>>>>>>>
>>>>>>>> Name: S Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 11893215
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73412, -71.44797
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12681978
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73492, -71.44951
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: S Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 11893214
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73958, -71.45895
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12681979
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76416, -71.46681
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.204945
>>>>>>>> ID: 12681977
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.747, -71.45957
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Case #2
>>>>>>>>
>>>>>>>> When i did this it also worked by adding ~ to make it Fuzzy query
>>>>>>>> to Main word:
>>>>>>>>
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>>>>> phraseAnalyzer) ;
>>>>>>>>           Query q1 = null;
>>>>>>>>           try {
>>>>>>>>               q1 = parser.parse("Main~");
>>>>>>>>           } catch (ParseException e) {
>>>>>>>>               e.printStackTrace();
>>>>>>>>           }
>>>>>>>>           booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>>>
>>>>>>>>
>>>>>>>> query plan:
>>>>>>>>
>>>>>>>> [+contentDFLT:main~2, +contentDFLT:"nashua",
>>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>>>
>>>>>>>> testQuerySearch1 Time to compute: 24 seconds (due to debugging
>>>>>>>> stops)
>>>>>>>> Number of results: 12
>>>>>>>> Name: Main Dunstable Rd
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12677400
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.72631, -71.50269
>>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
>>>>>>>> UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12681980
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76416, -71.46681
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12681973
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75045, -71.4607
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12681974
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76019, -71.465
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main Dunstable Rd
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12677399
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.74641, -71.48943
>>>>>>>> Search Key: MAIN DUNSTABLE NASHUA HILLSBOROUGH NEW HAMPSHIRE
>>>>>>>> UNITED STATES
>>>>>>>>
>>>>>>>> Name: S Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 11893215
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73412, -71.44797
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12681978
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73492, -71.44951
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: S Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 11893214
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73958, -71.45895
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12681979
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76416, -71.46681
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 41.06405
>>>>>>>> ID: 12681977
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.747, -71.45957
>>>>>>>> Search Key: MAIN NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Case #3
>>>>>>>>
>>>>>>>> But why does this not work with fuzzy mode and i misspelled a bit
>>>>>>>> (1 edit away) and as You saw the data is there with Main spelling:
>>>>>>>>
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>>>>> phraseAnalyzer) ;
>>>>>>>>
>>>>>>>>           Query q1 = null;
>>>>>>>>           try {
>>>>>>>>               q1 = parser.parse("Mains~");  // 1 edit away
>>>>>>>>           } catch (ParseException e) {
>>>>>>>>               e.printStackTrace();
>>>>>>>>           }
>>>>>>>>           booleanQuery.add(q1, BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "NASHUA"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>>>
>>>>>>>> query plan:
>>>>>>>>
>>>>>>>> [+contentDFLT:mains~2, +contentDFLT:"nashua",
>>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>>>
>>>>>>>> testQuerySearch1 Time to compute: 23 seconds (due to debugging
>>>>>>>> stops)
>>>>>>>>
>>>>>>>> Number of results: 0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Case #4
>>>>>>>>
>>>>>>>> Then i changed q1 to SHOULD from MUST above: and i think fuzzy
>>>>>>>> query is ignored here since there is no MAIN in the first 468
>>>>>>>> resuls:
>>>>>>>>
>>>>>>>> there is no boost for Mains term here.
>>>>>>>>
>>>>>>>> query plan:
>>>>>>>>
>>>>>>>> [contentDFLT:mains~2, +contentDFLT:"nashua",
>>>>>>>> +contentDFLT:"new-hampshire", +contentDFLT:"united states"]
>>>>>>>>
>>>>>>>> testQuerySearch1 Time to compute: 125 seconds (due to debugging
>>>>>>>> stops)
>>>>>>>> Number of results: 1794
>>>>>>>> Name: Nashua Dr
>>>>>>>> Score: 34.186226
>>>>>>>> ID: 4974936
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.7636, -71.46063
>>>>>>>> Search Key: NASHUA NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Nashua River Rail Trl
>>>>>>>> Score: 34.186226
>>>>>>>> ID: 4975508
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.7062, -71.53962
>>>>>>>> Search Key: NASHUA RIVER RAIL NASHUA HILLSBOROUGH NEW HAMPSHIRE
>>>>>>>> UNITED STATES
>>>>>>>>
>>>>>>>> Name: Nashua Rd
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 4975388
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.78746, -71.92823
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: NASHUA
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 21014865
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75873, -71.46438
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: NASHUA
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 21014865
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75873, -71.46438
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: NASHUA
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 21014865
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75873, -71.46438
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: NASHUA
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 21014865
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75873, -71.46438
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: NASHUA
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 21014865
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.75873, -71.46438
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Nashua St
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 4975671
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.88471, -70.81687
>>>>>>>> Search Key: NASHUA ROCKINGHAM NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>> Name: Nashua Rd
>>>>>>>> Score: 33.84896
>>>>>>>> ID: 4975400
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.79014, -71.92364
>>>>>>>> Search Key: NASHUA HILLSBOROUGH NEW HAMPSHIRE UNITED STATES
>>>>>>>>
>>>>>>>>
>>>>>>>> Why is the fuzzy query ignored?
>>>>>>>> Even if i have separate fields for street, city,region, country,
>>>>>>>> this fuzzy query issue will come into place for words with
>>>>>>>> multiple parts like main dunstable etc., right?
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>> On 6/12/19 11:36 AM, baris.kazar@oracle.com wrote:
>>>>>>>>
>>>>>>>> Tomoko,-
>>>>>>>>
>>>>>>>>    Thank You for Your suggestions. i am trying to understand it
>>>>>>>> and i thought i did :)
>>>>>>>>
>>>>>>>> but it does not work with FuzzyQuery when i used with a *single*
>>>>>>>> large TextField like street=...value... city=...value...
>>>>>>>> region=...value... country=...value... (with or without quotes
>>>>>>>> for the values)
>>>>>>>>
>>>>>>>> What i knew about Lucene fuzzy queries are not holding now with
>>>>>>>> this Textfield form. That is why i suspected of a bug.
>>>>>>>>
>>>>>>>> 1. Yes, i saw and have a solid proof on that now.
>>>>>>>>
>>>>>>>> 2. yes but FuzzyQuery takes quotes as they are as they are
>>>>>>>> escaped and it is not analyzed.
>>>>>>>>
>>>>>>>> Stuffing into one textfield vs having separate fields should only
>>>>>>>> affect probably the performance but not the outcome in my case.
>>>>>>>> But, i have been thinking about this and maybe it is the way to
>>>>>>>> go in this case.
>>>>>>>>
>>>>>>>> mY CONTENT field has street names in mixed case and city, region
>>>>>>>> country names in UPPERCASE. Can this be a problem?
>>>>>>>> i thought index stored them in lowercase since i am using
>>>>>>>> StandardAnalyzer.
>>>>>>>>
>>>>>>>> CONTENT field also has full textfield string with street=...
>>>>>>>> city=... region=... country=... (here all values are UPPERCASE).
>>>>>>>>
>>>>>>>> Why cant the index find the names via FuzzyQuery? i tried both
>>>>>>>> FuzzyQuery and Query builder as i showed before.
>>>>>>>>
>>>>>>>> The last advice in Your previous email would nicely go outside
>>>>>>>> the parantheses since it might be very critical :) :) :)
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/12/19 12:17 AM, Tomoko Uchida wrote:
>>>>>>>>
>>>>>>>> I'd suggest to correctly understand the way a software works before
>>>>>>>> suspecting its bug :-)
>>>>>>>>
>>>>>>>> I guess you may miss two points:
>>>>>>>>
>>>>>>>> 1. the standard analyzer (standard tokenizer) breaks words by double
>>>>>>>> quote (U+0022) so quotes are not indexed or searched at all if
>>>>>>>> you are
>>>>>>>> using standard analyzer. (That is the reason you have same results
>>>>>>>> with or without quotes.)
>>>>>>>> See:
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_8-5F1-5F0_core_org_apache_lucene_analysis_standard_StandardTokenizer.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=8E2lp1YIGM-3v3FspeieGl8z8rEBs6qioTudtFNzh8c&e=
>>>>>>>> and
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__unicode.org_reports_tr29_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=riCZ_f25XW869CKbHPUqfbLiDU-AukE6la0xTLMw6u8&e=
>>>>>>>>
>>>>>>>> 2. double quote has special meaning (it's interpreted as phrase
>>>>>>>> query)
>>>>>>>> with the built-in query parser so you need to escape it if you
>>>>>>>> want to
>>>>>>>> search double quotes itself.
>>>>>>>> See:
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Terms&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=1L6ZQKxmWmYxDX4uJHxzY5SAR_UCl6UUXCo916wzXCo&s=t8OYTgidvcwNpAVFuTsqGhDJK5BwUZVCxc0mPHzqCYU&e=
>>>>>>>>
>>>>>>>> (My advice would be to create separate fields for each key value
>>>>>>>> pairs
>>>>>>>> instead of stuffing all pairs into one text field, if you need to
>>>>>>>> search them separately.)
>>>>>>>>
>>>>>>>> 2019年6月12日(水) 2:39 <baris.kazar@oracle.com>:
>>>>>>>>
>>>>>>>> i can say that quotes is not the issue with index as it still
>>>>>>>> results in
>>>>>>>> same results with quotes or without quotes.
>>>>>>>>
>>>>>>>> i am starting to feel that this might be a bug maybe??
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/10/19 2:46 PM, baris.kazar@oracle.com wrote:
>>>>>>>>
>>>>>>>> Somehow " is causing an issue as this should return street with
>>>>>>>> MAIN:
>>>>>>>>
>>>>>>>> [contentDFLT:street="MAINS"~2, +contentDFLT:"city nashua",
>>>>>>>> +contentDFLT:"region new-hampshire", +contentDFLT:"country united
>>>>>>>> states"] -> this was with fuzzyquery on MAINS
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/10/19 2:24 PM, baris.kazar@oracle.com wrote:
>>>>>>>>
>>>>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>>>>>> +contentDFLT:"country united states", contentDFLT:street
>>>>>>>> contentDFLT:mains]
>>>>>>>>
>>>>>>>> QueeryParser chops it into two pieces from
>>>>>>>> parser.parser("street=\"MAINS\"");
>>>>>>>>
>>>>>>>> Index has a TextField named contentDFLT the following data :
>>>>>>>> street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW
>>>>>>>> HAMPSHIRE" country="UNITED STATES"
>>>>>>>>
>>>>>>>>
>>>>>>>> When i set street=\"MAINS~\" with parser:
>>>>>>>> i get the following
>>>>>>>> [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire",
>>>>>>>> +contentDFLT:"country united states", contentDFLT:street
>>>>>>>> contentDFLT:mains]
>>>>>>>>
>>>>>>>> probably " quotations are messing this up as You were saying...
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/10/19 12:48 PM, Tomoko Uchida wrote:
>>>>>>>>
>>>>>>>> Or, " (double quotation) in your query string may affect query
>>>>>>>> parsing.
>>>>>>>>
>>>>>>>> When I parse this string by classic query parser (lucene 8.1),
>>>>>>>> street="MAINS~"
>>>>>>>> parsed (raw) query is
>>>>>>>> text:street text:mains
>>>>>>>> (I set the default search field to "text", so text:xxxx is appeared
>>>>>>>> here.)
>>>>>>>>
>>>>>>>> Query parsing is a complex process, so it would be good to check
>>>>>>>> parsed raw query string especially when you have (reserved) special
>>>>>>>> characters in your query...
>>>>>>>>
>>>>>>>> 2019年6月11日(火) 1:10 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I noticed one small thing in your previous mail.
>>>>>>>>
>>>>>>>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results
>>>>>>>>
>>>>>>>> which is good.
>>>>>>>>
>>>>>>>> To specify a search field, ":" (colon) should be used instead of
>>>>>>>> "=".
>>>>>>>> See the query parser documentation:
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e=
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not sure this is related to your problem.
>>>>>>>>
>>>>>>>> 2019年6月11日(火) 0:51 <baris.kazar@oracle.com>:
>>>>>>>>
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST);
>>>>>>>>
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser parser = new
>>>>>>>> org.apache.lucene.queryparser.classic.QueryParser(field,
>>>>>>>> phraseAnalyzer) ;
>>>>>>>>              Query q1 = null;
>>>>>>>>              try {
>>>>>>>>                  q1 = parser.parse("MAIN");
>>>>>>>>              } catch (ParseException e) {
>>>>>>>>
>>>>>>>>                  e.printStackTrace();
>>>>>>>>              }
>>>>>>>>              booleanQuery.add(q1, BooleanClause.Occur.SHOULD);
>>>>>>>>
>>>>>>>> testQuerySearch2 Time to compute: 0 seconds
>>>>>>>> Number of results: 1775
>>>>>>>> Name: Main St
>>>>>>>> Score: 37.20959
>>>>>>>> ID: 12681979
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.76416, -71.46681
>>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 37.20959
>>>>>>>> ID: 12681977
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.747, -71.45957
>>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>>>
>>>>>>>> Name: Main St
>>>>>>>> Score: 37.20959
>>>>>>>> ID: 12681978
>>>>>>>> Country Code: US
>>>>>>>> Coordinates: 42.73492, -71.44951
>>>>>>>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH"
>>>>>>>> region="NEW HAMPSHIRE" country="UNITED STATES"
>>>>>>>>
>>>>>>>>       when i use q1 = parser.parse("street=\"MAIN\""); i get same
>>>>>>>> results
>>>>>>>> which is good.
>>>>>>>>
>>>>>>>> But when i switch to MAINS~ then fuzzy query does not work.
>>>>>>>>
>>>>>>>>
>>>>>>>> i need to say something with the q1 only in the booleanquery:
>>>>>>>> it tries to match the MAIN in street, city, region and country
>>>>>>>> which are
>>>>>>>> in a single TextField field.
>>>>>>>> But i dont want this. that is why i need to street="..." etc when
>>>>>>>> searching.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/10/19 11:31 AM, Tomoko Uchida wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> just for the basic verification, can you find the document without
>>>>>>>> fuzzy query? I mean, does this query work for you?
>>>>>>>>
>>>>>>>> Query query = parser.parse("MAIN");
>>>>>>>>
>>>>>>>> Tomoko
>>>>>>>>
>>>>>>>> 2019年6月11日(火) 0:22 <baris.kazar@oracle.com>:
>>>>>>>>
>>>>>>>> why cant the second set not work at all?
>>>>>>>>
>>>>>>>> it is indexed as Textfield like street="..." city="..." etc.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote:
>>>>>>>>
>>>>>>>> i dont know how to use Fuzzyquery with queryparser but probably
>>>>>>>> You
>>>>>>>> are suggesting
>>>>>>>>
>>>>>>>> QueryParser parser = new QueryParser(field, analyzer) ;
>>>>>>>> Query query = parser.parse("MAINS~2");
>>>>>>>>
>>>>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD);
>>>>>>>>
>>>>>>>> am i right?
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote:
>>>>>>>>
>>>>>>>> I would suggest using a QueryParser for your fuzzy query before
>>>>>>>> adding it to the Boolean query. This should weed out any case
>>>>>>>> issues.
>>>>>>>>
>>>>>>>> On Mon, 10 Jun 2019 at 8:06 PM, <baris.kazar@oracle.com
>>>>>>>> <mailto:baris.kazar@oracle.com>> wrote:
>>>>>>>>
>>>>>>>>          BooleanQuery.Builder booleanQuery = new
>>>>>>>> BooleanQuery.Builder();
>>>>>>>>
>>>>>>>>          //First set
>>>>>>>>
>>>>>>>>                  booleanQuery.add(new FuzzyQuery(new
>>>>>>>>          org.apache.lucene.index.Term(field, "MAINS")),
>>>>>>>>          BooleanClause.Occur.SHOULD);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>>          "NASHUA"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>>          "NEW HAMPSHIRE"), BooleanClause.Occur.MUST);
>>>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field,
>>>>>>>>          "UNITED STATES"), BooleanClause.Occur.MUST);
>>>>>>>>
>>>>>>>>          // Second set
>>>>>>>>                   //booleanQuery.add(new FuzzyQuery(new
>>>>>>>>          org.apache.lucene.index.Term(field, "street=\"MAINS\"")),
>>>>>>>>          BooleanClause.Occur.SHOULD);
>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>>>
>>>>>>>>          field, "city=\"NASHUA\""), BooleanClause.Occur.MUST);
>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>>>
>>>>>>>>          field, "region=\"NEW HAMPSHIRE\""),
>>>>>>>> BooleanClause.Occur.MUST);
>>>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer,
>>>>>>>>
>>>>>>>>          field, "country=\"UNITED STATES\""),
>>>>>>>> BooleanClause.Occur.MUST);
>>>>>>>>
>>>>>>>>          The first set brings also street with Nashua name.
>>>>>>>> (NASHUA).
>>>>>>>>
>>>>>>>>          so, to prevent that and since i also indexed with
>>>>>>>> street="..."
>>>>>>>>          city="..." i did the second set but it does not bring
>>>>>>>> anything.
>>>>>>>>
>>>>>>>>          createPhraseQuery builds a Phrasequery with one term
>>>>>>>> equal to the
>>>>>>>>          string
>>>>>>>>          in the call.
>>>>>>>>
>>>>>>>>          Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          On 6/10/19 10:47 AM, baris.kazar@oracle.com
>>>>>>>>          <mailto:baris.kazar@oracle.com> wrote:
>>>>>>>>          > How do i check how it is indexed? lowecase or uppercase?
>>>>>>>>          >
>>>>>>>>          > only way is now to by testing.
>>>>>>>>          >
>>>>>>>>          > i am using standardanalyzer.
>>>>>>>>          >
>>>>>>>>          > Best regards
>>>>>>>>          >
>>>>>>>>          >
>>>>>>>>          > On 6/9/19 11:57 AM, Atri Sharma wrote:
>>>>>>>>          >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida
>>>>>>>>          >> <tomoko.uchida.1111@gmail.com
>>>>>>>> <mailto:tomoko.uchida.1111@gmail.com>> wrote:
>>>>>>>>          >>> Hi,
>>>>>>>>          >>>
>>>>>>>>          >>> What analyzer do you use for the text field? Is the
>>>>>>>> term "Main"
>>>>>>>>          >>> correctly indexed?
>>>>>>>>          >> Agreed. Also, it would be good if you could post your
>>>>>>>> actual
>>>>>>>> code.
>>>>>>>>          >>
>>>>>>>>          >> What analyzer are you using? If you are using
>>>>>>>> StandardAnalyzer,
>>>>>>>>          then
>>>>>>>>          >> all of your terms while indexing will be lowercased,
>>>>>>>> AFAIK, but
>>>>>>>>          your
>>>>>>>>          >> query will not be analyzed until you run a
>>>>>>>> QueryParser on it.
>>>>>>>>          >>
>>>>>>>>          >>
>>>>>>>>          >> Atri
>>>>>>>>          >>
>>>>>>>>          >
>>>>>>>>          >
>>>>>>>>          >
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>          > To unsubscribe, e-mail:
>>>>>>>> java-user-unsubscribe@lucene.apache.org
>>>>>>>> <mailto:java-user-unsubscribe@lucene.apache.org>
>>>>>>>>          > For additional commands, e-mail:
>>>>>>>>          java-user-help@lucene.apache.org
>>>>>>>> <mailto:java-user-help@lucene.apache.org>
>>>>>>>>          >
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message