lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sebastian L. (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
Date Fri, 25 May 2012 14:22:23 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283467#comment-13283467
] 

sebastian L. commented on LUCENE-3440:
--------------------------------------

Hi Koji, 
hi Simon,

if there is something to do for me, please let me know. 

Maybe it would be better to split the patch in several smaller ones, e.g.

1. Use Getters/Setters where possible in FVH 
2. Make FieldFragList interface and BaseFieldFragList abstract class
3. Introduction of SimpleFieldFragList and SimpleFragListBuilder as default  
4. Introduction of WeightedFieldFragList and WeightedFragListBuilder  
5. Integration into Solr

When's the 4.0-release scheduled, anyway? 

A Patch for trunk 1342490 is on it's way. 
                
> FastVectorHighlighter: IDF-weighted terms for ordered fragments 
> ----------------------------------------------------------------
>
>                 Key: LUCENE-3440
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3440
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: sebastian L.
>            Priority: Minor
>              Labels: FastVectorHighlighter
>             Fix For: 4.0
>
>         Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440_3.6.1-SNAPSHOT.patch,
LUCENE-4.0-SNAPSHOT-3440-9.patch, weight-vs-boost_table01.html, weight-vs-boost_table02.html
>
>
> The FastVectorHighlighter uses for every term found in a fragment an equal weight, which
causes a higher ranking for fragments with a high number of words or, in the worst case, a
high number of very common words than fragments that contains *all* of the terms used in the
original query. 
> This patch provides ordered fragments with IDF-weighted terms: 
> total weight = total weight + IDF for unique term per fragment * boost of query; 
> The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer.
> The patch is simple, but it works for us. 
> Some ideas:
> - A better approach would be moving the whole fragments-scoring into a separate class.
> - Switch scoring via parameter 
> - Exact phrases should be given a even better score, regardless if a phrase-query was
executed or not
> - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding
fragments should be ranked higher 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message