Return-Path: X-Original-To: apmail-lucenenet-user-archive@www.apache.org Delivered-To: apmail-lucenenet-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 37C61F805 for ; Thu, 9 May 2013 16:09:43 +0000 (UTC) Received: (qmail 31266 invoked by uid 500); 9 May 2013 16:09:41 -0000 Delivered-To: apmail-lucenenet-user-archive@lucenenet.apache.org Received: (qmail 31229 invoked by uid 500); 9 May 2013 16:09:41 -0000 Mailing-List: contact user-help@lucenenet.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@lucenenet.apache.org Delivered-To: mailing list user@lucenenet.apache.org Received: (qmail 31196 invoked by uid 500); 9 May 2013 16:09:41 -0000 Delivered-To: apmail-lucene-lucene-net-user@lucene.apache.org Received: (qmail 31176 invoked by uid 99); 9 May 2013 16:09:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 16:09:41 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=834a10f72=Brad.Allan@fiserv.com designates 204.95.150.32 as permitted sender) Received: from [204.95.150.32] (HELO mail1.checkfree.com) (204.95.150.32) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 May 2013 16:09:34 +0000 X-IronPort-AV: E=Sophos;i="4.87,641,1363147200"; d="scan'208,217";a="209943119" Received: from iwpdlpem01.corp.checkfree.com (HELO iwpexht01.corp.checkfree.com) ([10.132.91.25]) by iapiron01.corp.checkfree.com with ESMTP; 09 May 2013 12:08:42 -0400 Received: from JWPKEXHT01.corp.checkfree.com (10.141.82.33) by iwpexht01.corp.checkfree.com (10.132.91.140) with Microsoft SMTP Server (TLS) id 8.3.279.5; Thu, 9 May 2013 12:08:42 -0400 Received: from JWPKEXMBX03.corp.checkfree.com ([169.254.5.217]) by JWPKEXHT01.corp.checkfree.com ([10.141.82.33]) with mapi id 14.02.0283.003; Thu, 9 May 2013 12:08:41 -0400 From: "Allan, Brad (Bracknell)" To: "lucene-net-user@lucene.apache.org" Subject: Minimize document hits based on number of matching terms between source text terms and document field terms Thread-Topic: Minimize document hits based on number of matching terms between source text terms and document field terms Thread-Index: Ac5MzzmhhEe3hkQrRr29nnuIRnq+hw== Date: Thu, 9 May 2013 16:08:41 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.25.66.43] Content-Type: multipart/alternative; boundary="_000_C3382910233E3D4A8287A71884FD1A70141644CDJWPKEXMBX03corp_" MIME-Version: 1.0 X-CFilter-Loop: True X-Virus-Checked: Checked by ClamAV on apache.org --_000_C3382910233E3D4A8287A71884FD1A70141644CDJWPKEXMBX03corp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I'd like to get any comments about how I might do this - I have list some o= ptions below, which of course I'll investigate... Example first: Name Field -------------- Mr. Youness Rokven Mr. Joe Paul Harry Arnold Mr. Paul B. Mitchell Mrs. Fernanda Joe Mitchell Ms. Jade Paula Victoria Muir Mr. Joe Harvey Pope If I search the above with text such as "Joe P.H. Arnold" which is turned i= nto a query: ((Joe) or (P) or (H) or (Arnold)) I get hits: Mr. Joe Paul Harry Arnold Mrs. Fernanda Joe Mitchell Mr. Joe Harvey Pope And the scores are great! The top hit having a higher relative score. What I'd like to do is exclude hits where say less than 2 terms matched the= document field terms. Options I think: 1.) Overide DefaultSimilarity? 2.) Construct awkward searches, example: ((Joe) and (P)) or ((Joe) and (H)) or ((Joe) and (Arnold)) etc ... all th= e possible combinations 3.) Use TermVector information? Don't know much about this, but my thoug= ht is that if highlighting knows the matching terms,...perhaps I use that? Would be grateful for comments. Thanks! ________________________________ CheckFree Solutions Limited (trading as Fiserv) Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester= , M15 ES Registered in England: No. 2694333 --_000_C3382910233E3D4A8287A71884FD1A70141644CDJWPKEXMBX03corp_--