Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 833 invoked from network); 14 Nov 2009 00:03:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Nov 2009 00:03:26 -0000 Received: (qmail 85079 invoked by uid 500); 14 Nov 2009 00:03:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 84989 invoked by uid 500); 14 Nov 2009 00:03:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 84974 invoked by uid 99); 14 Nov 2009 00:03:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Nov 2009 00:03:24 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ihasmax@gmail.com designates 209.85.216.186 as permitted sender) Received: from [209.85.216.186] (HELO mail-px0-f186.google.com) (209.85.216.186) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Nov 2009 00:03:16 +0000 Received: by pxi16 with SMTP id 16so2667111pxi.29 for ; Fri, 13 Nov 2009 16:02:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=wBm3xSPbMs+Pfzr5lp+7/4upd/HbSdKZ2cnRBRf9eg0=; b=Pv9QSTWGtwT71hG5iVY4hZ2CcvEZaG0WTsJ2j7pT2oHpmji1uD36V0yRPnSg6EXKGv +l/HXJak0lmMmTkHUUa++dwyr4qfh/WVANJRIHWB+ZCaU64CWcXb5IYi8yUmLe5wwa/k 9qstnMpnVfcKesyy9/z7j/TTVAFJi15Ff4e6M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ghA0EG4ZsYrobc5iATyMRRc9a/xyXJC6wrN+8nh9KqMVcWDKivl9kuruN+gBEeKZkx qq80BCKdj3b/jMr5cUIMv2cSHReCl68Bt+Dw3mbad4ROwdKOk4RB+12oGgawbLQBwDSh ux0KCb9Z/l8JBeGfGZoYpA+/UUeNfOg/49Nkk= MIME-Version: 1.0 Received: by 10.140.170.6 with SMTP id s6mr279913rve.159.1258156974820; Fri, 13 Nov 2009 16:02:54 -0800 (PST) In-Reply-To: <4b124c310911131548l778e92balc703dc93a5643dac@mail.gmail.com> References: <3836ec640911131409p6c0fc26bs9b77429889da55ec@mail.gmail.com> <4b124c310911131416q1ddbe7a5y2e3372ec4d2601e3@mail.gmail.com> <3836ec640911131424y5753f3bey57ef757d45365ef8@mail.gmail.com> <4b124c310911131429l14e1e67egeb69df08340517e0@mail.gmail.com> <3836ec640911131535h27882fdu60407b3e7c764e37@mail.gmail.com> <4b124c310911131548l778e92balc703dc93a5643dac@mail.gmail.com> Date: Fri, 13 Nov 2009 18:02:54 -0600 Message-ID: <3836ec640911131602n5c8409j85e5a64632a2ea65@mail.gmail.com> Subject: Re: Term Boost Threshold From: Max Lynch To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd296e68e4dce0478497f87 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd296e68e4dce0478497f87 Content-Type: text/plain; charset=ISO-8859-1 > > Now, I would like to know exactly what term was found. For example, if a > > result comes back from the query above, how do I know whether John Smith > > was > > found, or both John Smith and his company, or just John Smith > Manufacturing > > was found? > > > In general, this is actually very hard. Lucene does not even keep track > itself > of which terms in a given query matched a given document, but you really > just need to know which terms matched in the final "top hits" you're > showing > to the user, right? What is this information used for / why do you want to > know which term hit? Well I use results that have a name match as more important than ones with a company match, and ones with both are the most important. I was hoping term boosting would help me mathematically detect these cases (for example, a firstname + company match would have detectably higher score) without having to use a highlighter for what is clearly not its purpose. I also am not using a traditional search display, so every result I find is important and there is no pagination (it's a background search). Is it possible to do this with term boosting? Otherwise my highlighter solution works for the time being, it's just slow. Thanks, Max --000e0cd296e68e4dce0478497f87--