lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3320) Explore Proximity Scoring
Date Fri, 15 Jul 2011 20:59:00 GMT


Andrzej Bialecki  commented on LUCENE-3320:

An interesting concept to consider under this topic is sentence-level proximity scoring. This
is based on the assumption that often a proximity of terms within a single sentence is enough
to treat this as a stronger-than-average association of terms, so when sentence boundaries
are known the term positions can be reduced to just sentence numbers (i.e. postings from the
same sentence use the same position that is a sentence number).

This is a middle ground between the no-proximity data (omitPositions) and the full-proximity
data. There is some literature available on this that indicates this approach is promising: , it's also mentioned in the papers on
static index pruning.

> Explore Proximity Scoring 
> --------------------------
>                 Key: LUCENE-3320
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Sub-task
>          Components: core/search
>    Affects Versions: Positions Branch
>            Reporter: Simon Willnauer
>             Fix For: Positions Branch
> Positions will be first class citizens rather sooner than later. We should explore proximity
scoring possibilities as well as collection / scoring algorithms like proposed on LUCENE-2878
(2 phase collection)
> This paper might provide some basis for actual scoring implementation:

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message