lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
Date Thu, 25 Aug 2011 01:55:29 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090685#comment-13090685
] 

Robert Muir commented on LUCENE-2959:
-------------------------------------

I rearranged the BM25 in the branch a little bit, its now as fast as lucene's ranking formula:
{noformat}
                Task   QPS tfidf StdDev tfidf   QPS bm25 StdDev bm25      Pct diff
            SpanNear        4.29        0.52        4.14        0.49  -24% -   22%
              Phrase        3.97        0.25        3.89        0.25  -13% -   11%
                Term       82.18        4.78       81.00        2.56   -9% -    7%
      TermBGroup1M1P       83.30        2.41       82.12        2.20   -6% -    4%
        SloppyPhrase        8.03        0.31        7.93        0.43  -10% -    8%
         AndHighHigh       19.38        0.59       19.16        0.71   -7% -    5%
            PKLookup      175.49        4.33      173.67        4.20   -5% -    3%
          AndHighMed       40.99        1.12       40.71        1.07   -5% -    4%
         TermGroup1M       25.69        0.39       25.69        0.44   -3% -    3%
              Fuzzy2       42.62        1.83       42.65        1.80   -8% -    8%
              Fuzzy1       91.74        3.48       91.86        3.44   -7% -    7%
             Respell       73.96        3.30       74.18        3.29   -8% -    9%
            Wildcard       56.33        0.97       56.60        1.08   -3% -    4%
             Prefix3       33.36        0.83       33.59        0.97   -4% -    6%
        TermBGroup1M       55.58        1.03       56.17        0.88   -2% -    4%
              IntNRQ       13.38        0.74       13.58        0.94  -10% -   14%
           OrHighMed       11.71        1.18       11.94        0.97  -14% -   22%
          OrHighHigh        8.91        0.74        9.13        0.63  -11% -   19%
{noformat}

> [GSoC] Implementing State of the Art Ranking for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-2959
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2959
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/query/scoring, general/javadocs, modules/examples
>            Reporter: David Mark Nemeskey
>            Assignee: Robert Muir
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: flexscoring branch
>
>         Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture
is
> tailored specically to VSM, which makes the addition of new ranking functions a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to implement
a
> query architecture with pluggable ranking functions.
> The wiki page for the project can be found at http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message