From java-user-return-64405-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon Jun 10 18:08:17 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 698AA180649 for ; Mon, 10 Jun 2019 20:08:17 +0200 (CEST) Received: (qmail 90794 invoked by uid 500); 10 Jun 2019 18:08:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 90782 invoked by uid 99); 10 Jun 2019 18:08:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Jun 2019 18:08:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 3994BC0253 for ; Mon, 10 Jun 2019 18:08:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.712 X-Spam-Level: X-Spam-Status: No, score=-1.712 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=oracle.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id jbEviEO1Zc5L for ; Mon, 10 Jun 2019 18:08:11 +0000 (UTC) Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8F8AF5F531 for ; Mon, 10 Jun 2019 18:08:10 +0000 (UTC) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x5AI4GrX088716; Mon, 10 Jun 2019 18:08:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=b/ii2KU1PjS3GH8IydlZaD49ftcEmXkNcwe/admumBk=; b=f0hZ6bZbVXuXUMyKFrrcKAAvsDSNso9F6mnCNt1RIUY4nhPyUdTGnPCqu+P4yAYgYwDw QEZmFeaQxM/LVxJeTsDViaQnnWzqsBGT8Aae9BUkSb5tIT1dx+lZAoO7Dhw83z8i+7ue BcPdyHC3DVq3NiZOLZOlno96Nnw9OJAPZTQuGl5xS/D6utx/HvKxpBBlVSWierDf0xtd XgLjc8lJp2qE0ks7+xwwaQoYg5/v9mlZP7LyAup/gHW44WkYHDw2R6Ddzd2IYcjkbIfJ s5kpol65VDxa1GwN51h7JZGFRx1mHDRwiZ3nGTpjrh4yhtFRwFtvRdQd4EbhSkErotXh xw== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 2t04etggbf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 10 Jun 2019 18:08:07 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x5AI6kwu124698; Mon, 10 Jun 2019 18:08:06 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 2t1jph0r7k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 10 Jun 2019 18:08:06 +0000 Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x5AI84uV009947; Mon, 10 Jun 2019 18:08:05 GMT Received: from [10.149.250.13] (/10.149.250.13) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Jun 2019 11:08:04 -0700 Subject: Re: FuzzyQuery To: java-user@lucene.apache.org, tomoko.uchida.1111@gmail.com, Atri Sharma , Michael McCandless , "baris.kazar" References: <67f70839-bc01-4031-806e-75b0d563573d@default> <700fca25-b7ac-4780-6b0a-f01e62728cf2@oracle.com> <7903943b-6f44-9762-cfc1-3e4c60ba229b@oracle.com> <0101016b41dbc7b7-0a35454c-535e-4336-8fef-d5ccc3bfd0d9-000000@us-west-2.amazonses.com> <67ef9c9f-3318-a04d-44bb-7acba6446de6@oracle.com> From: baris.kazar@oracle.com Organization: Oracle Corporation Message-ID: Date: Mon, 10 Jun 2019 14:24:51 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9284 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906100122 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9284 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906100123 [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire", +contentDFLT:"country united states", contentDFLT:street contentDFLT:mains] QueeryParser chops it into two pieces from parser.parser("street=\"MAINS\""); Index has a TextField named contentDFLT the following data : street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" region="NEW HAMPSHIRE" country="UNITED STATES" When i set street=\"MAINS~\" with parser: i get the following [+contentDFLT:"city nashua", +contentDFLT:"region new-hampshire", +contentDFLT:"country united states", contentDFLT:street contentDFLT:mains] probably " quotations are messing this up as You were saying... Best regards On 6/10/19 12:48 PM, Tomoko Uchida wrote: > Or, " (double quotation) in your query string may affect query parsing. > > When I parse this string by classic query parser (lucene 8.1), > street="MAINS~" > parsed (raw) query is > text:street text:mains > (I set the default search field to "text", so text:xxxx is appeared here.) > > Query parsing is a complex process, so it would be good to check > parsed raw query string especially when you have (reserved) special > characters in your query... > > 2019年6月11日(火) 1:10 Tomoko Uchida : >> Hi, >> >> I noticed one small thing in your previous mail. >> >>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results >> which is good. >> >> To specify a search field, ":" (colon) should be used instead of "=". >> See the query parser documentation: >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lucene.apache.org_core_8-5F1-5F0_queryparser_org_apache_lucene_queryparser_classic_package-2Dsummary.html-23Fields&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=u4SeJqH4lePhOazCLwxLEr3WqcMkODtYLv4njiKZ4PM&s=WrNfUXO9gz1PqpczTJw1vD9sWqvr76WRv2Aeo9uWqa4&e= >> >> I'm not sure this is related to your problem. >> >> 2019年6月11日(火) 0:51 : >>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, >>> "city=\"NASHUA\""), BooleanClause.Occur.MUST); >>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, >>> "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST); >>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, >>> "country=\"UNITED STATES\""), BooleanClause.Occur.MUST); >>> >>> org.apache.lucene.queryparser.classic.QueryParser parser = new >>> org.apache.lucene.queryparser.classic.QueryParser(field, phraseAnalyzer) ; >>> Query q1 = null; >>> try { >>> q1 = parser.parse("MAIN"); >>> } catch (ParseException e) { >>> >>> e.printStackTrace(); >>> } >>> booleanQuery.add(q1, BooleanClause.Occur.SHOULD); >>> >>> testQuerySearch2 Time to compute: 0 seconds >>> Number of results: 1775 >>> Name: Main St >>> Score: 37.20959 >>> ID: 12681979 >>> Country Code: US >>> Coordinates: 42.76416, -71.46681 >>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" >>> region="NEW HAMPSHIRE" country="UNITED STATES" >>> >>> Name: Main St >>> Score: 37.20959 >>> ID: 12681977 >>> Country Code: US >>> Coordinates: 42.747, -71.45957 >>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" >>> region="NEW HAMPSHIRE" country="UNITED STATES" >>> >>> Name: Main St >>> Score: 37.20959 >>> ID: 12681978 >>> Country Code: US >>> Coordinates: 42.73492, -71.44951 >>> Search Key: street="MAIN" city="NASHUA" municipality="HILLSBOROUGH" >>> region="NEW HAMPSHIRE" country="UNITED STATES" >>> >>> when i use q1 = parser.parse("street=\"MAIN\""); i get same results >>> which is good. >>> >>> But when i switch to MAINS~ then fuzzy query does not work. >>> >>> >>> i need to say something with the q1 only in the booleanquery: >>> it tries to match the MAIN in street, city, region and country which are >>> in a single TextField field. >>> But i dont want this. that is why i need to street="..." etc when searching. >>> >>> Best regards >>> >>> >>> >>> On 6/10/19 11:31 AM, Tomoko Uchida wrote: >>>> Hi, >>>> >>>> just for the basic verification, can you find the document without >>>> fuzzy query? I mean, does this query work for you? >>>> >>>> Query query = parser.parse("MAIN"); >>>> >>>> Tomoko >>>> >>>> 2019年6月11日(火) 0:22 : >>>>> why cant the second set not work at all? >>>>> >>>>> it is indexed as Textfield like street="..." city="..." etc. >>>>> >>>>> Best regards >>>>> >>>>> >>>>> >>>>> On 6/10/19 11:23 AM, baris.kazar@oracle.com wrote: >>>>>> i dont know how to use Fuzzyquery with queryparser but probably You >>>>>> are suggesting >>>>>> >>>>>> QueryParser parser = new QueryParser(field, analyzer) ; >>>>>> Query query = parser.parse("MAINS~2"); >>>>>> >>>>>> booleanQuery.add(query, BooleanClause.Occur.SHOULD); >>>>>> >>>>>> am i right? >>>>>> Best regards >>>>>> >>>>>> >>>>>> On 6/10/19 10:47 AM, Atri Sharma wrote: >>>>>>> I would suggest using a QueryParser for your fuzzy query before >>>>>>> adding it to the Boolean query. This should weed out any case issues. >>>>>>> >>>>>>> On Mon, 10 Jun 2019 at 8:06 PM, >>>>>> > wrote: >>>>>>> >>>>>>> BooleanQuery.Builder booleanQuery = new BooleanQuery.Builder(); >>>>>>> >>>>>>> //First set >>>>>>> >>>>>>> booleanQuery.add(new FuzzyQuery(new >>>>>>> org.apache.lucene.index.Term(field, "MAINS")), >>>>>>> BooleanClause.Occur.SHOULD); >>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, >>>>>>> "NASHUA"), BooleanClause.Occur.MUST); >>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, >>>>>>> "NEW HAMPSHIRE"), BooleanClause.Occur.MUST); >>>>>>> booleanQuery.add(Utils.createPhraseQuery(phraseAnalyzer, field, >>>>>>> "UNITED STATES"), BooleanClause.Occur.MUST); >>>>>>> >>>>>>> // Second set >>>>>>> //booleanQuery.add(new FuzzyQuery(new >>>>>>> org.apache.lucene.index.Term(field, "street=\"MAINS\"")), >>>>>>> BooleanClause.Occur.SHOULD); >>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer, >>>>>>> field, "city=\"NASHUA\""), BooleanClause.Occur.MUST); >>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer, >>>>>>> field, "region=\"NEW HAMPSHIRE\""), BooleanClause.Occur.MUST); >>>>>>> //booleanQuery.add(Utils.createPhraseQueryFullText(phraseAnalyzer, >>>>>>> field, "country=\"UNITED STATES\""), BooleanClause.Occur.MUST); >>>>>>> >>>>>>> The first set brings also street with Nashua name. (NASHUA). >>>>>>> >>>>>>> so, to prevent that and since i also indexed with street="..." >>>>>>> city="..." i did the second set but it does not bring anything. >>>>>>> >>>>>>> createPhraseQuery builds a Phrasequery with one term equal to the >>>>>>> string >>>>>>> in the call. >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 6/10/19 10:47 AM, baris.kazar@oracle.com >>>>>>> wrote: >>>>>>> > How do i check how it is indexed? lowecase or uppercase? >>>>>>> > >>>>>>> > only way is now to by testing. >>>>>>> > >>>>>>> > i am using standardanalyzer. >>>>>>> > >>>>>>> > Best regards >>>>>>> > >>>>>>> > >>>>>>> > On 6/9/19 11:57 AM, Atri Sharma wrote: >>>>>>> >> On Sun, Jun 9, 2019 at 8:53 PM Tomoko Uchida >>>>>>> >> >>>>>> > wrote: >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> What analyzer do you use for the text field? Is the term "Main" >>>>>>> >>> correctly indexed? >>>>>>> >> Agreed. Also, it would be good if you could post your actual >>>>>>> code. >>>>>>> >> >>>>>>> >> What analyzer are you using? If you are using StandardAnalyzer, >>>>>>> then >>>>>>> >> all of your terms while indexing will be lowercased, AFAIK, but >>>>>>> your >>>>>>> >> query will not be analyzed until you run a QueryParser on it. >>>>>>> >> >>>>>>> >> >>>>>>> >> Atri >>>>>>> >> >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> --------------------------------------------------------------------- >>>>>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>> >>>>>>> > For additional commands, e-mail: >>>>>>> java-user-help@lucene.apache.org >>>>>>> >>>>>>> > >>>>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org