lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hui Fang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-965) Implement a state-of-the-art retrieval function in Lucene
Date Thu, 26 Jul 2007 05:40:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515514
] 

Hui Fang commented on LUCENE-965:
---------------------------------

Hi Grant and Doron, 

Thank you very much for your comments! They are very useful. I agree that it would be interesting
to evaluate this in the context of Lucene-836, which is a very nice idea. Actually, my advisor
and I also discussed that we could put some evaluation scripts in Lucene so that others could
easily evaluate the retrieval performance. Hope that Lucene-836 would be finalized soon, and
please let me know if there is anything I could help. 

Regarding to the speed, the axiomatic retrieval function should have the same computatlonal
complexity as the default function if we could compute the average document length at the
indexing time instead of search time.  As Doron pointed out, my current implementation is
not optimal, I will fix this problem and other svn related problems as soon as possible, and
resubmit a new patch. 

Thanks,
-Hui

 

> Implement a state-of-the-art retrieval function in Lucene
> ---------------------------------------------------------
>
>                 Key: LUCENE-965
>                 URL: https://issues.apache.org/jira/browse/LUCENE-965
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>            Reporter: Hui Fang
>         Attachments: axiomaticFunction.patch
>
>
> We implemented the axiomatic retrieval function, which is a state-of-the-art retrieval
function, to 
> replace the default similarity function in Lucene. We compared the performance of these
two functions and reported the results at http://sifaka.cs.uiuc.edu/hfang/lucene/Lucene_exp.pdf.

> The report shows that the performance of the axiomatic retrieval function is much better
than the default function. The axiomatic retrieval function is able to find more relevant
documents and users can see more relevant documents in the top-ranked documents. Incorporating
such a state-of-the-art retrieval function could improve the search performance of all the
applications which were built upon Lucene. 
> Most changes related to the implementation are made in AXSimilarity, TermScorer and TermQuery.java.
 However, many test cases are hand coded to test whether the implementation of the default
function is correct. Thus, I also made the modification to many test files to make the new
retrieval function pass those cases. In fact, we found that some old test cases are not reasonable.
For example, in the testQueries02 of TestBoolean2.java, 
> the query is "+w3 xx", and we have two documents "w1 xx w2 yy w3" and "w1 w3 xx w2 yy
w3". 
> The second document should be more relevant than the first one, because it has more 
> occurrences of the query term "w3". But the original test case would require us to rank

> the first document higher than the second one, which is not reasonable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message