lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From José Ramón Perez Aguera <jose.agu...@gmail.com>
Subject Re: Query Expansion Module for Lucene based on BM25 ranking function
Date Wed, 22 Oct 2008 22:07:00 GMT
Hi Grant,

Our query expansion approach is quite simple, we apply pseudo- 
relevance feedback techniques, where a number of top retrieved  
documents are used to extract the terms candidates to expand the  
original query. We have used TermPositions in query time to extract  
the term statistics necessaries for query expansion. On the other  
hand, to implement BM25, we have used the implementation propoused by  
Joaquin perez, where avg. Length is computed in indexing time and it  
is used as a constant in query time.

We know that this is not the best way to do that, but we don't have  
the knowledge about lucene to made a better implementation. Our code  
is  for research projects only, and it is not robust enougth for real- 
world projects.

I have not followed any of the flexible indexing discussions, but this  
subject seems to be very interesting to me. Do you have any pointers?.

Finally, the code has been reliased under apache license. But we are  
not experts on this kind of things. Our idea is that everyone can take  
the code, change it and use it without constraints.

Best regards

Jose

José Ramón Pérez Agüera

On 22/10/2008, at 20:16, Grant Ingersoll <gsingers@apache.org> wrote:

> Hi José,
>
> Can you explain your approach to implementing?  I'm curious how you  
> incorporated in the avg. doc length.  Also, have you followed any of  
> the flexible indexing discussions?
>
> Finally, what's the license on this code?
>
> Thanks,
> Grant
>
> On Oct 21, 2008, at 10:14 AM, José Ramón Pérez Agüera wrote:
>
>> Hello,
>>
>> We have implemented a research module for lucene using BM25 and our
>> structured version of BM25 as ranking functions and a couple of
>> state-of-art query expansion algoritms.
>>
>> This implementation is quite different to other query expansion
>> modules for Lucene that are available in the web.
>>
>> We have used this implementation in CLEF 2008 Robust WSD track
>> http://ixa2.si.ehu.es/clirwsd/ with very good results. The code and
>> other technical and scientific details can be accessed here:
>>
>> http://grasia.fdi.ucm.es/jose/query-expansion
>>
>> This is an alpha0.0.never.used.before.by.anyone version, we will work
>> on it in the following months to make it easier to use.
>>
>> Please do not hesitate to send comments, questions and/or complaints
>> to the authors.
>>
>> jose
>>
>> -- 
>> José Ramón Pérez Agüera
>>
>> Dept. de Ingeniería del Software e Inteligencia Artificial
>> Despacho 411 tlf. 913947599
>> Facultad de Informática
>> Universidad Complutense de Madrid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message