lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "S.L. (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
Date Tue, 20 Sep 2011 09:33:09 GMT
FastVectorHighlighter: IDF-weighted terms for ordered fragments 
----------------------------------------------------------------

                 Key: LUCENE-3440
                 URL: https://issues.apache.org/jira/browse/LUCENE-3440
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/highlighter
    Affects Versions: 3.5
            Reporter: S.L.
            Priority: Minor
             Fix For: 3.5


The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes
a higher ranking for fragments with a high number of words or, in the worst case, a high number
of very common words than fragments that contains *all* of the terms used in the original
query. 

This patch provides ordered fragments with IDF-weighted terms: 

total weight = total weight + IDF for unique term per fragment * boost of query; 

The ranking-formular should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer.

The patch is simple, but it works for us. 

Some ideas:
- A better approach would be moving the whole fragments-scoring into a separate class.
- Switch scoring via parameter 
- Exact phrases should be given a even better score, regardless if a phrase-query was executed
or not
- edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments
should be ranked higher 







--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message