lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mark Nemeskey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
Date Thu, 31 Mar 2011 10:23:05 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013907#comment-13013907
] 

David Mark Nemeskey commented on LUCENE-2959:
---------------------------------------------

Robert: thanks for all the info! It's nice to see so much work has already been done. I plan
to delve into it after the selection, and try to get other things out of the way until then,
so that I can concentrate on GSoC during the summer.

I think the main point would be to make the addition of a new ranking function as easy as
possible. At least a prototype implementation should be very straightforward, even at the
expense of performance. Then, if the new method provides good results, the developer can go
on to the lower level to squeeze more juice out of it. It's hard for me to discuss new this
without knowing the code, of course, but do you think it is possible?

Even though I added a "Performance" section to my proposal (http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1),
I see now that it's probably more important than I believed it to be at first. I think I will
follow your advice and concentrate on how to make BM25F fast. It may be a bit tougher nut
to crack than DFR, as the latter has logarithms scattered all over it. However, the first
thing that comes to mind is that the tf-BM25 curve becomes almost flat very quickly (less
so for a high k1 value, though). So it may be possible to pre-compute a tf map or array for
a query.

> [GSoC] Implementing State of the Art Ranking for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-2959
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2959
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Examples, Javadocs, Query/Scoring
>            Reporter: David Mark Nemeskey
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf
>
>
> Lucene employs the Vector Space Model (VSM) to rank documents, which compares
> unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture
is
> tailored specically to VSM, which makes the addition of new ranking functions a non-
> trivial task.
> This project aims to bring state of the art ranking methods to Lucene and to implement
a
> query architecture with pluggable ranking functions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message