lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joaquin Perez Iglesias <joaquin.pe...@lsi.uned.es>
Subject Re: Summer of Code idea for lucene
Date Tue, 02 Sep 2008 09:28:05 GMT
Hi all,

finally I got some time to finish the BM25/BM25F implementation for 
Lucene you can find more details at 
http://nlp.uned.es/~jperezi/Lucene-BM25/, it has been tested but I 
cannot assure that is bugs free.
It would be great to receive some feedback about it.

There are some details about the implementation that I consider will be 
of interest,as how to calculate the average_length or  idf at document 
level.
Please if you find any bug or mistake in the supplied implementation let 
me know and I will try to solve it, same for questions.

Hope that some of you will find useful.

Thanks in advance.



joaquin.perez@lsi.uned.es escribió:
> Hi Otis,
>
> as my colleague said, we have a first implementation of BM25 over Lucene, this development
is part of a (almost finished) thesis project that compares different IR models, over an standard
collection. At the same time we are trying to extend this first implementation in order to
support BM25F for multifield queries, unfortunately at this time we are too busy to prepare
a final version of this code, so we will have to finish this code over the summer (hopefully
we will have more time :-))), and make it public at this time.
>
> We will inform to this list when we will finish the preparation of a final version.
>
> Thanks to everybody for the interest!!!
>
> Bye
> Joaquin
>
> -----------------------------------------------------------
> Joaquín Pérez Iglesias
> Dpto. Lenguajes y Sistemas Informáticos
> E.T.S.I. Informática (UNED)
> Ciudad Universitaria
> C/ Juan del Rosal nº 16
> 28040 Madrid - Spain
> Phone. +34 91 398 87 25
> Fax    +34 91 398 65 35
> Office  2.07
> Email: joaquin.perez@lsi.uned.es
> ----------------------------------------------------------- 
> Otis Gospodnetic <otis_gospodnetic@yahoo.com> escribe :
>
>   
>> Hi Jose,
>>
>> I was wondering if you ever got to this.  I would love to see and try BM25 for
>> Lucene!
>>
>>
>> I'm looking at http://code.google.com/soc/2008/asf/about.html
>> and it looks like this didn't make it into GSoC, but this would still be great
>> to have.
>>
>> Thanks,
>> Otis
>> --
>> Sematext -- http://sematext.com/ --
>> Lucene - Solr - Nutch
>>
>>
>> ----- Original Message ----
>>     
>>> From: José Ramón Pérez Agüera <jose.aguera@gmail.com>
>>> To: java-dev@lucene.apache.org;
>>>       
>> Joaquin Perez-Iglesias <joaquin.perez.iglesias@gmail.com>
>>     
>>> Sent: Saturday, March 15, 2008 4:54:08 AM
>>> Subject: Re: Summer of Code idea for lucene
>>>
>>> we have almost implemented BM25 using lucene structure, but we need
>>> help to finish query parser and other details. If you o somebody want
>>> We can send you the code and you can help us to implement the query
>>> parser and prepare the code to sandbox.
>>>
>>> If there are people interested I can made a web page for the project
>>> and put our implementatio to download
>>>
>>> Somebody is interested?
>>>
>>> jose
>>>
>>> -- 
>>> José Ramón Pérez Agüera
>>>
>>> Dept. de Ingeniería del Software e Inteligencia Artificial
>>> Despacho 411 tlf. 913947599
>>> Facultad de Informática
>>> Universidad Complutense de Madrid
>>>
>>> On Sat, Mar 15, 2008 at 5:32 AM, Ian Holsman wrote:
>>>       
>>>> If no one objects (I don't think it's too late)
>>>>
>>>>  would you mind a GSOC project to implement BM25
>>>>         
>> relevancy/scoring?
>>     
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
> ________________________________________________
> Servicio WebMail de CiberUNED http://www.uned.es
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>   

-- 
-----------------------------------------------------------
Joaquín Pérez Iglesias
Dpto. Lenguajes y Sistemas Informáticos
E.T.S.I. Informática (UNED)
Ciudad Universitaria
C/ Juan del Rosal nº 16
28040 Madrid - Spain
Phone. +34 91 398 87 25
Fax    +34 91 398 65 35
Office  2.07
Email: joaquin.perez@lsi.uned.es
web:   http://nlp.uned.es/~jperezi/
-----------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message