From java-user-return-48883-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Mar 04 18:18:46 2011 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 39213 invoked from network); 4 Mar 2011 18:18:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Mar 2011 18:18:46 -0000 Received: (qmail 55998 invoked by uid 500); 4 Mar 2011 18:18:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 55967 invoked by uid 500); 4 Mar 2011 18:18:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55932 invoked by uid 99); 4 Mar 2011 18:18:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2011 18:18:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of patrick.diviacco@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Mar 2011 18:18:38 +0000 Received: by fxm2 with SMTP id 2so3172663fxm.35 for ; Fri, 04 Mar 2011 10:18:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=LNR0RZlRZkCCYWZVodpIUxaLIR0xQx7oUvaTWyrybio=; b=YGRxbmDTvZMuSm1TxnPIoF4KJxMY7AMaDPxw+9QFiJ23s4FvG83NhSaVcWbvDUDdCd 1O6uDVeLNlIjjvCn4Ue63rPxVq7GV6pUHKRX7l/vPy3+PcPVNbbgDQA/ZUs24FsZGDce EarhGIseutnCe6XBA7/1Tq+1nS0YIWb0ASTFQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=nabR4Vh2/smNJWQ/W76vNwyW5lUHzsR1Hvh3Ab+GxXORWi9zORwpYWhQ2GqeLpkRj7 b4NLwPR2mrkyA1LzQWAZvqwN53Imn8xRuRzCSJe/YMaGHAb3Fh71unth0vqt+XCMajzC MkcqYp4Ao9P2+wMUmmkDKibpVKg38/HDIiN60= MIME-Version: 1.0 Received: by 10.223.2.198 with SMTP id 6mr1209517fak.10.1299262683737; Fri, 04 Mar 2011 10:18:03 -0800 (PST) Received: by 10.223.126.11 with HTTP; Fri, 4 Mar 2011 10:18:03 -0800 (PST) In-Reply-To: References: Date: Fri, 4 Mar 2011 19:18:03 +0100 Message-ID: Subject: Re: Lucene nightly build: similarity score per field From: Patrick Diviacco To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=20cf3054a33dbbf0f8049dac2a3f --20cf3054a33dbbf0f8049dac2a3f Content-Type: text/plain; charset=ISO-8859-1 All right. So it is still not clear how to exactly implement it. I have SimilarityA and SimilarityB subclasses. So far, I know I can customize the similarity class for the searcher: searcher.setSimilarity(new BoostingSimilarity()); When/how should I use get method ? Similarity get(String field) thanks On 3 March 2011 16:34, Robert Muir wrote: > On Thu, Mar 3, 2011 at 10:25 AM, Patrick Diviacco > wrote: > > I've downloaded Lucene nightly build because I need to customize the > > similarity *per field*. > > > > However I don't see the field parameter passed to the methods to compute > the > > score such as "tf" and "idf"... > > > > how can I implement different similarities score per document field then > ? > > > > Hi, the way you set this up is to use SimilarityProvider to configure > Similarities per-field: for example maybe field A, B, and C use > Similarity1 and field D use Similarity2. > So you just set your SimilarityProvider on the IndexWriter and > IndexSearcher, and it must implement this factory method: > > Similarity get(String field) > > Here are the reasons for this factory design (versus simply adding > field to every method): > 1. performance: up-front we ask the SimilarityProvider for the > per-field Similarity. So you probably use a hashmap or something here > to return the correct one. If you had to do this on every single call > to tf(), this would slow down queries significantly. > 2. flexibility: we are working to generalize Similarity, and maybe the > existing stuff you see becomes TFIDFSimilarity. So in the future you > might have field1 that uses TFIDF and field2 that uses something else > (e.g. BM25), with a totally different API and scoring system. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --20cf3054a33dbbf0f8049dac2a3f--