Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 2312 invoked from network); 9 Mar 2009 14:05:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Mar 2009 14:05:52 -0000 Received: (qmail 96451 invoked by uid 500); 9 Mar 2009 14:05:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 96430 invoked by uid 500); 9 Mar 2009 14:05:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 96419 invoked by uid 99); 9 Mar 2009 14:05:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2009 07:05:45 -0700 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of peterlkeegan@gmail.com designates 209.85.198.238 as permitted sender) Received: from [209.85.198.238] (HELO rv-out-0506.google.com) (209.85.198.238) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2009 14:05:37 +0000 Received: by rv-out-0506.google.com with SMTP id k40so1930040rvb.5 for ; Mon, 09 Mar 2009 07:05:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Wx9ZTeTxqwY6C61Iy+teb5cmZBYvtZ/BX/SYbCDr+AY=; b=SXHqKtIaLgrEW2anVgWv0Y2SRKIjaldAtwEFZvW10Jot70RQVlrnW1qZp55gaVnEAD EIoKjcGpyAk0Yc4q/tl/5ZjmB5WoXw9PaFHPLTsYn7ZYQQab3F7L6sDTFc11D1UD+GRK 6Mq85F+uJGgGu9b1YEEAd/uG8Rf9PWr48N/6M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=PYjxMAkIkDOz8auvMs5FAMOM9bmo1YVODT+cag0SyUxlhpOolVToR71vmLXa8a8wSb fFOuvAG1v6lvjdvbILYW5ZGrgSZvn2HoTQmRRokKxbj9TRIpLMpG4oHBSLmbLnhoBBEv SEyE6ebrXTd8nzs6ovzc5MpLWmS71awwSPQWk= MIME-Version: 1.0 Received: by 10.141.198.2 with SMTP id a2mr3094709rvq.58.1236607515838; Mon, 09 Mar 2009 07:05:15 -0700 (PDT) In-Reply-To: References: Date: Mon, 9 Mar 2009 09:05:15 -0500 Message-ID: Subject: Re: sloppyFreq question From: Peter Keegan To: java-user Content-Type: multipart/alternative; boundary=000e0cd20c56b5488c0464b01f5d X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd20c56b5488c0464b01f5d Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit The reason I asked about Span scoring is that the behavior changed when I switched from TermQuery to BoostingTermQuery to take advantage of payloads. It seems to me that a SpanTermQuery and BoostingTermQuery should behave the same as TermQuery with respect to term frequency. The 'edit distance' isn't really relevant for these queries, is it? For a SpanNearQuery that contains SpanTermQueries, the score for a match on "the quick brown fox" would be lower than a match on "brown fox" because of the edit distance (4 vs 2). This seems counter intuitive, too. Any comments? Thanks, Peter On Tue, Mar 3, 2009 at 2:42 PM, Peter Keegan wrote: > The DefaultSimilarity class defines sloppyFreq as: > > public float sloppyFreq(int distance) { > return 1.0f / (distance + 1); > } > > For a 'SpanNearQuery', this reduces the effect of the term frequency on the > score as the number of terms in the span increases. So, for a simple phrase > query (using spans), the longer the phrase, the lower the TF. For a simple > SpanTermQuery, the TF is reduced in half (1.0f / 1 + 1). > > I'm just wondering why this is the default behavior. For 'SpanTermQuery', > I'd expect the TF to reflect the actual number of occurrences of the term. > For a SpanNearQuery, wouldn't it still be the number of occurrences of the > whole span, not the number of terms in the span? > > Thanks, > Peter > --000e0cd20c56b5488c0464b01f5d--