Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 45884 invoked from network); 30 Mar 2011 15:26:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Mar 2011 15:26:47 -0000 Received: (qmail 10280 invoked by uid 500); 30 Mar 2011 15:26:46 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 10192 invoked by uid 500); 30 Mar 2011 15:26:45 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 10179 invoked by uid 99); 30 Mar 2011 15:26:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2011 15:26:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2011 15:26:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D62D58A52C for ; Wed, 30 Mar 2011 15:26:05 +0000 (UTC) Date: Wed, 30 Mar 2011 15:26:05 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: <950198947.21517.1301498765874.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1238437159.10432.1299754259722.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2959: -------------------------------- Attachment: LUCENE-2959_mockdfr.patch David, for your perusal here is another sim i tried to write: DFR I(F)L2 its probably got bugs, but demonstrates again the challenges here. If we want to support ranking systems like this, how can they be made fast? The one i wrote has no score caching, so it does a lot of per-document divisions, multiplications, etc and this is no good. So its gonna be hard to make these have competitive performance with lucene's current scoring, which for TF < 32 is an array lookup and a single multiplication. Its more obvious to me how to eek good performance from the language modelling formula because you can re-arrange the log and boil it down to some addition, but we need to get creative thinking about how to make some of these other models fast, and its more complicated if you want to make say a dfr "framework" that allows you to pick basic model and the 2 normalizations, versus specializing the code for each possibility (and there are many). My advice to you for GSOC would be to just pick one of these (e.g. BM25) and figure out how to do it really well, good performance, good api and documentation, and good relevance testing to ensure its quality. I'm more than happy to help with the boring parts like refactoring lucene's Explanations API :) > [GSoC] Implementing State of the Art Ranking for Lucene > ------------------------------------------------------- > > Key: LUCENE-2959 > URL: https://issues.apache.org/jira/browse/LUCENE-2959 > Project: Lucene - Java > Issue Type: New Feature > Components: Examples, Javadocs, Query/Scoring > Reporter: David Mark Nemeskey > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf > > > Lucene employs the Vector Space Model (VSM) to rank documents, which compares > unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is > tailored specically to VSM, which makes the addition of new ranking functions a non- > trivial task. > This project aims to bring state of the art ranking methods to Lucene and to implement a > query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org