Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5C1B61F4 for ; Fri, 15 Jul 2011 20:59:26 +0000 (UTC) Received: (qmail 6710 invoked by uid 500); 15 Jul 2011 20:59:25 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 6578 invoked by uid 500); 15 Jul 2011 20:59:24 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 6571 invoked by uid 99); 15 Jul 2011 20:59:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2011 20:59:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jul 2011 20:59:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 0672959A65 for ; Fri, 15 Jul 2011 20:59:00 +0000 (UTC) Date: Fri, 15 Jul 2011 20:59:00 +0000 (UTC) From: "Andrzej Bialecki (JIRA)" To: dev@lucene.apache.org Message-ID: <508672186.18741.1310763540022.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2042764336.18670.1310761799953.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3320) Explore Proximity Scoring MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066199#comment-13066199 ] Andrzej Bialecki commented on LUCENE-3320: ------------------------------------------- An interesting concept to consider under this topic is sentence-level proximity scoring. This is based on the assumption that often a proximity of terms within a single sentence is enough to treat this as a stronger-than-average association of terms, so when sentence boundaries are known the term positions can be reduced to just sentence numbers (i.e. postings from the same sentence use the same position that is a sentence number). This is a middle ground between the no-proximity data (omitPositions) and the full-proximity data. There is some literature available on this that indicates this approach is promising: http://www.springerlink.com/content/t5355418276v7115 , it's also mentioned in the papers on static index pruning. > Explore Proximity Scoring > -------------------------- > > Key: LUCENE-3320 > URL: https://issues.apache.org/jira/browse/LUCENE-3320 > Project: Lucene - Java > Issue Type: Sub-task > Components: core/search > Affects Versions: Positions Branch > Reporter: Simon Willnauer > Fix For: Positions Branch > > > Positions will be first class citizens rather sooner than later. We should explore proximity scoring possibilities as well as collection / scoring algorithms like proposed on LUCENE-2878 (2 phase collection) > This paper might provide some basis for actual scoring implementation: http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org