Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 49759 invoked from network); 12 Aug 2004 14:41:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 12 Aug 2004 14:41:02 -0000 Received: (qmail 43357 invoked by uid 500); 12 Aug 2004 14:40:14 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 43162 invoked by uid 500); 12 Aug 2004 14:40:13 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 43107 invoked by uid 99); 12 Aug 2004 14:40:12 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received: from [80.91.224.249] (HELO main.gmane.org) (80.91.224.249) by apache.org (qpsmtpd/0.27.1) with ESMTP; Thu, 12 Aug 2004 07:40:11 -0700 Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 1BvGkT-0005zX-00 for ; Thu, 12 Aug 2004 16:40:09 +0200 Received: from mail.idoox.com ([194.213.203.154]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 12 Aug 2004 16:40:09 +0200 Received: from literakl by mail.idoox.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 12 Aug 2004 16:40:09 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: lucene-user@jakarta.apache.org From: Leos Literak Subject: Re: boost keywords Date: Thu, 12 Aug 2004 16:43:21 +0200 Lines: 44 Message-ID: References: <411B7983.209@webimpact.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: mail.idoox.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; cs-CZ; rv:1.7) Gecko/20040616 X-Accept-Language: cs, en-us, en In-Reply-To: <411B7983.209@webimpact.com> Sender: news X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Don Vaillancourt napsal(a): > It seems like you know very little about Lucene. Is this the case or do > you have a more specific problem that should be looked at. Well, I dont consider myself as lucene newbie. ;-) I am just confused with boosting feature and how to use it. Usually when I index some article, I set several Fields to index or store (like URL, type of object etc). Then I extract all texts from HTML and store it into indexed field called "content". Finally I add this Document into IndexWriter. During search phase I construct new Query and by default I search "content" field. User might to create more advanced query and limit search to specific objects only (articles, news, hardware ...) That's primitive use case, I know. But it works well. But I'd like to make it more powerfull (and precise). For example to boost content of

tag. Or as in my previous post, to boost extra information entered by article author into keywords section. But how can I do that? There is no support in Document.Field to mark part of text with different boost factor, is it? If I know, then I can boost whole Field only. What is the trick for this? (I was wondering that it may be solution to create new indexed field with boosted words and include it into search - besides "content". But the results were wild, matches in boosted field had very high score, while other matches had too small score and there were big lap between these two classes. E.g. 95%, 94%, 15%, 12% Was it correct way?) Can you please help me find out best approach? I dont want to reinvent wheel, I'd like to reuse experience of more experienced user :-) Thanks Leos --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org