From java-dev-return-17976-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Fri Feb 02 09:55:13 2007 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 56958 invoked from network); 2 Feb 2007 09:55:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 2 Feb 2007 09:55:04 -0000 Received: (qmail 80150 invoked by uid 500); 2 Feb 2007 09:55:07 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 80114 invoked by uid 500); 2 Feb 2007 09:55:07 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 80096 invoked by uid 99); 2 Feb 2007 09:55:07 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Feb 2007 01:55:07 -0800 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_10_20,HTML_MESSAGE,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [147.96.1.101] (HELO brett.sim.ucm.es) (147.96.1.101) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Feb 2007 01:54:58 -0800 Received: from brett.sim.ucm.es (brett.sim.ucm.es [127.0.0.1]) by localhost.ucm.es (Postfix) with ESMTP id 952C539E513 for ; Fri, 2 Feb 2007 10:54:32 +0100 (CET) Received: from ULISES.sim.ucm.es (unknown [10.150.1.31])by brett.sim.ucm.es (Postfix) with ESMTP id 663C339E4E0for ; Fri, 2 Feb 2007 10:54:32 +0100 (CET) Received: from ucm.es ([10.147.128.157])by ULISES.sim.ucm.es (Sun Java System Messaging Server 6.2-6.01 (built Apr 32006)) with ESMTP id <0JCT006REY700X20@ULISES.sim.ucm.es> forjava-dev@lucene.apache.org; Fri, 02 Feb 2007 10:54:36 +0100 (CET) Received: from [10.150.1.32] (Forwarded-For: [10.150.1.32])by menelao.adm.ucm.es (mshttpd); Fri, 02 Feb 2007 10:54:36 +0100 Date: Fri, 02 Feb 2007 10:54:36 +0100 From: =?iso-8859-1?B?Ikpvc+kgUmFt824gUOlyZXogQWf8ZXJhIg==?= Subject: Re: implementatin of the state-of-art retrieval models for lucene? In-reply-to: To: Hui Fang Cc: java-dev@lucene.apache.org Message-id: MIME-version: 1.0 X-Mailer: Sun Java(tm) System Messenger Express 6.2-6.01 (built Apr 3 2006) Content-type: multipart/alternative; boundary=--145538bc7e4735d31e6b Content-language: es X-Accept-Language: es Priority: normal References: X-imss-version: 2.045 X-imss-result: Passed X-imss-scanInfo: M:B L:E SM:2 X-imss-tmaseResult: TT:1 TS:-21.7630 TC:1F TRN:50 TV:3.6.1039(14970.003) X-imss-scores: Clean:100.00000 C:0 M:0 S:0 R:0 X-imss-settings: Baseline:1 C:1 M:1 S:1 R:1 (0.0000 0.0000) X-Virus-Checked: Checked by ClamAV on apache.org ----145538bc7e4735d31e6b Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Dear Hui=2C i=27m a Ph=2E d=2E student from University Complutense of Madrid (Spain)= where i=27m teaching assistant also=2C in the departament of Artificial Intelligence=2E I=27m working with Lucene from two years ago=2C and i=27= m very interesting on re-implement certain classes (TermQuery=2C TermScorer=2C DefaultSimilarity) to adapt it to the state-of-art models in information retrieval BM25=2C LM=2C DFR=2C etc=2E I=27m working also in = the implementation of the evaluation module for Lucene to work with TREC collections and similars=2E I think that would be a good idea if we create a subproject of Lucene to develop new IR models and differents tools focused to IR community=2E= I would be very interested on this issue and i think that would be very intereseting not only for IR comunity but also to Lucene comunity=2E What do you think about this idea=3F Best jose ----- Mensaje original ----- De=3A Hui Fang =3Chuihuifang=40gmail=2Ecom=3E Fecha=3A Viernes=2C Febrero 2=2C 2007 5=3A45 am Asunto=3A implementatin of the state-of-art retrieval models for lucene=3F= A=3A java-dev=40lucene=2Eapache=2Eorg =3E Dear all=2C =3E = =3E My primary research interest is Information retrieval=2C with a = =3E focus on =3E developing =3E effective and robust retrieval models=2E I am happy to send my = =3E first email =3E to Lucene community=2E =3E = =3E Lucene and nutch are really useful IR systems=2E But I think that = =3E the current =3E retrieval function =3E implemented in Lucene does not perform as well as other state-of-art= =3E retrieval functions in terms of effectiveness=2E=A0 I have = =3E implemented some =3E state-of-art models =3E (such as pivoted normalization=2C okapi and axiomatic retrieval mode= ls) =3E on top of Lucene=2C and evaluated these models and the default model= =3E implemented in =3E Lucene using standard IR evaluation methodology=2E Experiments = =3E show that =3E the state-of-art retrieval functions outperform the default one=2E =3E Actually=2C this is one assignment my advisor and I designed for = =3E our IR =3E course=2E =3E = =3E After posting this assignment online=2C quite a few IR researchers = =3E contactedus and =3E asked for the code of our implementations=2E=A0 So=2C we think that =3E it might be beneficial to everyone in the lucene community and = =3E IR research =3E community=2C =3E if we could contribute our implementation of the state-of-art = =3E retrievalfunctions to Lucene=2E =3E I think that our contribution could help improve the retrieval = =3E performancefor both =3E Lucene and nutch=2E =3E = =3E What do you think=3F =3E = =3E Thanks=2C =3E -Hui =3E = Jos=E9 Ram=F3n P=E9rez Ag=FCera Dept=2E de Ingenier=EDa del Software e Inteligencia Artificial Despacho 411 tlf=2E 913947599 Facultad de Inform=E1tica Universidad Complutense de Madrid ----145538bc7e4735d31e6b--