Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 54497173FF for ; Mon, 30 Mar 2015 17:49:35 +0000 (UTC) Received: (qmail 86478 invoked by uid 500); 30 Mar 2015 17:49:23 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 86406 invoked by uid 500); 30 Mar 2015 17:49:23 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 86395 invoked by uid 99); 30 Mar 2015 17:49:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2015 17:49:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jtao@webmd.net designates 157.56.110.103 as permitted sender) Received: from [157.56.110.103] (HELO na01-bn1-obe.outbound.protection.outlook.com) (157.56.110.103) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Mar 2015 17:49:16 +0000 Received: from BY2PR08CA0073.namprd08.prod.outlook.com (25.163.62.169) by BY2PR08MB189.namprd08.prod.outlook.com (10.242.43.11) with Microsoft SMTP Server (TLS) id 15.1.118.21; Mon, 30 Mar 2015 17:48:19 +0000 Received: from BN1BFFO11FD033.protection.gbl (2a01:111:f400:7c10::1:148) by BY2PR08CA0073.outlook.office365.com (2a01:111:e400:58a4::41) with Microsoft SMTP Server (TLS) id 15.1.125.19 via Frontend Transport; Mon, 30 Mar 2015 17:48:19 +0000 Received: from exht01l-crp-03.webmdhealth.net (207.138.251.38) by BN1BFFO11FD033.mail.protection.outlook.com (10.58.144.96) with Microsoft SMTP Server (TLS) id 15.1.130.10 via Frontend Transport; Mon, 30 Mar 2015 17:48:18 +0000 Received: from EXMBX01L-CRP-03.webmdhealth.net ([fe80::5dee:f0f2:86fe:c40f]) by exht01l-crp-03.webmdhealth.net ([::1]) with mapi id 14.03.0210.002; Mon, 30 Mar 2015 13:48:17 -0400 From: "Tao, Jing" To: "solr-user@lucene.apache.org" Subject: protected phrases - possible? Thread-Topic: protected phrases - possible? Thread-Index: AdBrDkidrgIzKjKDRgmWgt22TxwX1g== Date: Mon, 30 Mar 2015 17:48:16 +0000 Message-ID: <6DFE5FE777890F4786F01470C2BFDFD74224E2A8@EXMBX01L-CRP-03.webmdhealth.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.46.41.168] Content-Type: multipart/alternative; boundary="_000_6DFE5FE777890F4786F01470C2BFDFD74224E2A8EXMBX01LCRP03we_" MIME-Version: 1.0 X-EOPAttributedMessage: 0 Received-SPF: Pass (protection.outlook.com: domain of webmd.net designates 207.138.251.38 as permitted sender) receiver=protection.outlook.com; client-ip=207.138.251.38; helo=exht01l-crp-03.webmdhealth.net; Authentication-Results: spf=pass (sender IP is 207.138.251.38) smtp.mailfrom=jtao@webmd.net; lucene.apache.org; dkim=none (message not signed) header.d=none; X-Forefront-Antispam-Report: CIP:207.138.251.38;CTRY:US;IPV:NLI;EFV:NLI;BMV:1;SFV:NSPM;SFS:(10019020)(6009001)(438002)(199003)(41574002)(189002)(229853001)(33656002)(19300405004)(569274001)(53416004)(86362001)(2656002)(87936001)(19580395003)(46102003)(19625215002)(55846006)(512954002)(15975445007)(2920100001)(2900100001)(6806004)(102836002)(2351001)(107886001)(450100001)(92566002)(106466001)(110136001)(54356999)(62966003)(5250100002)(84326002)(16236675004)(50986999)(2501003)(77156002)(104016003);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR08MB189;H:exht01l-crp-03.webmdhealth.net;FPR:;SPF:Pass;MLV:sfv;A:1;MX:3;LANG:en; X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR08MB189; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006);SRVR:BY2PR08MB189;BCL:0;PCL:0;RULEID:;SRVR:BY2PR08MB189; X-Forefront-PRVS: 05315CBE52 X-OriginatorOrg: webmd.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Mar 2015 17:48:18.2354 (UTC) X-MS-Exchange-CrossTenant-Id: 8423b0e1-4e0c-47b0-8280-02b47a7165e1 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=8423b0e1-4e0c-47b0-8280-02b47a7165e1;Ip=[207.138.251.38];Helo=[exht01l-crp-03.webmdhealth.net] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR08MB189 X-Virus-Checked: Checked by ClamAV on apache.org --_000_6DFE5FE777890F4786F01470C2BFDFD74224E2A8EXMBX01LCRP03we_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, The way our collection is setup, searches for "breast cancer" are returning= results for ovarian cancer, or anything that contains either "breast" or "= cancer". The reason is, we are searching across multiple fields. Even tho= ugh I have set a "mm" value so that if less than 3 terms, ALL terms much ma= tch...SOLR considers it all matched even though "breast" was in the title a= nd "cancer" is in the description. Is there a way to protect certain phrases so that they will not be tokenize= d? I tried using CommonGramsFilterFactory, but having "breast cancer" in t= he word list did not seem to do anything. I'm guessing it's because the fi= eld is tokenized first, so nothing would match that phrase. If I put "brea= st" and "cancer" as separate entries in the word list, I end up with too ma= ny unnecessary shingles, and "breast" and "cancer" are still two of the fin= al terms. I have a feeling CommonGramsFilterFactory is not the right way to handle th= is. What are other options? Is it better to put all fields in one field, = apply mm, and proximity boost? Thanks! Jing --_000_6DFE5FE777890F4786F01470C2BFDFD74224E2A8EXMBX01LCRP03we_--