lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
Date Wed, 30 Mar 2011 15:26:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2959:
--------------------------------

    Attachment: LUCENE-2959_mockdfr.patch

David, for your perusal here is another sim i tried to write: DFR I(F)L2

its probably got bugs, but demonstrates again the challenges here.

If we want to support ranking systems like this, how can they be made fast?

The one i wrote has no score caching, so it does a lot of per-document divisions, multiplications,
etc and this is no good.

So its gonna be hard to make these have competitive performance with lucene's current scoring,
which for TF < 32 is an array lookup and a single multiplication.

Its more obvious to me how to eek good performance from the language modelling formula because
you can re-arrange the log and boil it down to some addition, but we need to get creative
thinking about how to make some of these other models fast, and its more complicated if you
want to make say a dfr "framework" that allows you to pick basic model and the 2 normalizations,
versus specializing the code for each possibility (and there are many).

My advice to you for GSOC would be to just pick one of these (e.g. BM25) and figure out how
to do it really well, good performance, good api and documentation, and good relevance testing
to ensure its quality.

I'm more than happy to help with the boring parts like refactoring lucene's Explanations API
:)


> [GSoC] Implementing State of the Art Ranking for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-2959
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2959
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Examples, Javadocs, Query/Scoring
>            Reporter: David Mark Nemeskey
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture
is
> tailored specically to VSM, which makes the addition of new ranking functions a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to implement
a
> query architecture with pluggable ranking functions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message